ada  1  29728 


Research  Report  1332 


IMPROVING  THE  SELECTION, 
CLASSIFICATION,  AND  UTILIZATION  OF 
ARMY  ENLISTED  PERSONNEL 

PROJECT  A:  RESEARCH  PLAN 


Human  Resources  Research  Organization, 
American  Institutes  for  Research, 
Personnel  Decisions  Research  Institute, 
and 

Army  Research  Institute 


SELECTION  AND  CLASSIFICATION  TECHNICAL  AREA 


dtic 

ELECTEf 


T 


U.S.  Army 

Research  Institute  for  the  Behavioral  and  Social  Sciences 


May  1983 


u 


u 


Approved  for  public  releeee,  dietributlon  unlimited. 


U.  S.  ARMY  RESEARCH  INSTITUTE 

FOR  THE  BEHAVIORAL  AND  SOCIAL  SCIENCES 


A  Field  Operating  Agency  under  the  Jurisdiction  of  the 
Deputy  Chief  of  Staff  for  Personnel 


EDGAR  M.  JOHNSON 
Technical  Director 


L.  NEALE  COSBY 
Colonel,  IN 
Commander 


Technical  review  by: 

NEWELL  K.  EATON 
JOYCE  L  SHIELDS 


Notices 

DISTRIBUTION:  Primary  dirtribution  of  this  report  hai  been  made  by  ARI.  Pleate  addreu  correspondence 
concerning  distribution  of  reports  to.  U.S.  Army  Research  Institute  for  the  Behavioral  and  Social  Sciences. 
ATTN:  PERI-TST,  5001  Eisenhower  Avenue,  Alexandria,  Virginia  22333. 

FINAL  DISPOSITION:  This  report  may  be  destroyed  when  it  is  no  longer  needed.  Please  do  not  return  it 
to  the  U.S  Army  Research  Institute  for  the  Behavioral  and  Social  Sciences. 

NOTE:  The  findings  in  this  report  are  not  to  be  construed  as  an  official  Department  of  the  Army  position, 
unless  so  designated  by  other  authorized  documents. 


UNCLASSIFIED _ 

SECURITY  CLASSIFICATION  OF  THU  FAOC  (Wk—  Omm 

I  REPORT  DOCUMENTATION  PAGE 


[E  aovT  accession  no. 


Research  Project  1332 


*.  TITUS  (mt  l«MU») 

Improving  the  Selection,  Classification  and 
Utilization  of  Army  Enlisted  Personnel 
Project  A:  Research  Plan 

t-  author^  Human  Resources  Research  Organization 
American  Institutes  for  Research 
Personnel  Decisions  Research  Institute 
Army  Research  Institute 

••  FSRFORMINO  ORGANIZATION  NAm£  AND  AOORSSS 

Human  Resources  Research  Organization 
300  N.  Washington  Street 
Alexandria,  VA  22314 

M.  CONTROLUNO  OFFICE  NAMS.ANO  ADDRESS, 

u.5.  Army  Research  institute  for  the 
Behavioral  and  Social  Sciences 
3001  Eisenhower  Avenue 

Alexandria.  VA  22333 _ 


name  a  Aoeessarff  ii 


CwOlllM  Ottlam) 


READ  INSTRUCTIONS 
BEFORE  COMPLETING  FORM 
1.  RSClNItNT'I  CATAUOO  NUMBER 


s.  rrPt  OP  REFORT  A  RtRlOO  COVERED 

Research  Report 


Research  Report 
Oet82-Sep89 


*■  FWRFORMINO  ORO.  REFORT  NUMBSN 


lONTRACT  ON  QRANT  NUMBCRf*) 


MDA903-82-C-0531 


’0.  N  NOON  AN  CLEMENT.  FROiCCT,  T  ASK 
ANSA  «  WORK  UNIT  NUMSERS 

2Q263731A792 


12.  REFORT  OATS 

May  1983 _ 

is.  NUMBER  op  faoes 

456 _ 

IS.  SECURITY  CLAM.  (•!  Util  rapmtt) 

Unclassified 

Hal  OECLASSIFiCATIOn/OOWNORAOIMQ 
SCHEDULE 


is.  distribution  stati 


Approved  for  public  release;  distribution  unlimited 


I  IT.  DISTRIBUTION  STATSMSMT  (•!  (ha  ,*«>«(  alw(  /a  Slack  JO,  II  Afttaranf  Irmm  Rtpft) 


\  i».  sufflcmcntary  notcs 


The  Army  Research  Institute  technical  point  of  contact  is 
Dr.  Newell  K.  Eaton.  His  telephone  number  is  (202)  274-8275. 


it.  KtY  WORDS  fCanMnua  an  ravaraa  *l4m  ll  n aa.«aaair  mt  IPmUfr  *r  »/••*  mm**) 

Validation,  Predictors,  Longitudinal  Data  Base,  Validity  Generalization, 
Construct  Validation,  Army-wide  Measures,  Job  Knowledge  Tests,  Predictor 
Measures,  Criterion  Measures,  Performance  Measures. 


20.  *  warn  act  fCmmamm  »  w—  . . — «i  t  Mtmtllr  *>  klo«M  mum km) 

This  research  report  describes  a  research  plan  for  a  project  whose  objectives 
are  to:  (1)  validate  current  and  future  ASVAB  against  soldier  performance; 

(2)  develop  new  selection  and  classification  procedures  and  measures  to  opti¬ 
mize  the  match  between  soldier  abilities  and  M0S  requirements;  and  (3)  develop 
computer-based  decision-aide  for  managers  of  the  Army's  manpower  processes. 

The  objectives  of  the  research  will  be  met  by:  (1)  developing  new  ways  to 
measure  and  collect  data  on  the  military  applicant  pool;  (2)  developing  and 


00  t  JAM*?)  1473  COITION  OR  •  NOV  U  IS  OMOCITI 


UNCLASSIFIED 

s icuriyy  cuASkirtCATio*  of  r>,ts  o*>«  *n<»r»rf) 


MCVIWTT  CLAHIFICATIQN  OS  THU  »AOe<»a«e  fl<M  fcwK 


JlG-r'/'evaluating  new  predictors  of  soldier  performance  (e.g.,  psychomotor, 
perceptual,  cognitive  abilities,  end  biographical  information);  (3)  developing 
new  methods  to  measure  and  analyze  training  performance;  and  (4)  developing  and 
refining  adequate,  efficient  soldier  performance  measures  and  predictors  of 
enlisted  personnel  and  NCO  success. 


I 


UNCLASSIFIED 


tsCueiTy  classification  of  tmi*  A*oir*Ji«  0m  *ni»r*e) 


Research  Report  1332 


IMPROVING  THE  SELECTION, 
CLASSIFICATION,  AND  UTILIZATION  OF 
ARMY  ENLISTED  PERSONNEL 


PROJECT  A:  RESEARCH  PLAN 


Human  Rasourcas  Rasaarch  Organization, 
Amarican  Institutas  for  Rasaarch, 
Psrsonnai  Dachions  Rasaarch  Institute, 
and 

Army  Rasaarch  Institute 


Submitted  by: 

Newell  K.  Eaton,  Chief 
SELECTION  AND  CLASSIFICATION 
TECHNICAL  AREA 


Aooeaslon  For 


¥ 


NTIS  GRAM 
DTIC  TAB 
Unannounced  □ 

Justification - 


By - — 

Distribution/ 


Availability  Codes 


Dist 


A 


Avail  and/or 
Speoial 


Approved  if  technically  adequate 
and  submitted  for  publication  by: 

Joyce  L.  Shields,  Director 
MANPOWER  AND  PERSONNEL 
RESEARCH  LABORATORY 


U.S.  ARMY  RESEARCH  INSTITUTE  FOR  THE  BEHAVIORAL  AND  SOCIAL  SCIENCES 
8001  Eisenhower  Avenue,  Alexandria,  Virginia  22333 

Office,  Deputy  Chief  of  Staff  for  Parse nn si 
Department  of  the  Army 


May  1983 


Army  Protect  Number 
20283731 A792 


' 

Manpower  and  Psrsonnai 


Approved  for  public  release,  distribution  unlimited. 


FOREWORD 


This  document  describes  a  path  toward  achieving  the  goals  of  the  Army's 
current,  large-scale  manpower  and  personnel  research  effort  for  improving 
the  selection,  classification,  and  utilization  of  Army  enlisted  personnel. 
The  thrust  for  the  project  came  from  the  practical,  professional,  and  legal 
need  to  validate  the  Armed  Services  Vocational  Aptitude  Battery  (ASVAB--the 
current  US  military  selection/classification  test  battery)  and  other  selec¬ 
tion  variables  as  predictors  of  training  and  performance.  The  portion  of 
the  effort  described  herein  is  devoted  to  the  development  and  validation  of 
Army  Selection  and  Classification  Measures,  and  referred  to  as  "Project 
A".  This  work  Is  funded  primarily  by  Army  Project  Number  2Q263731A792. 
Another  part  of  the  effort  Is  the  development  of  a  prototype  Computerized 
Personnel  Allocation  System,  referred  to  as  "Project  8".  Together,  these 
Army  Research  Institute  research  efforts,  with  their  In-house  and  contract 
components,  comprise  a  landmark  program  to  develop  a  state-of-the-art 
empirically  validated  personnel  selection,  classification,  and  allocation 
system. 


EDGAR  M.  JOHNSON 
Technical  Director,  ARI  and 
Chief  Psychologist,  U.  S.  Army 


PREFACE 


i 

I 

The  planning  for  this  research  was  initiated  by  the  US  Army  Research 
Institute  for  the  Behavioral  and  Social  Sciences  ( AR I )  in  1980.  As 

j  In-house  resources  were  evaluated,  It  became  apparent  that  the  massive 

scope  of  the  effort  would  be  best  met  by  a  combination  of  the  talents  of 

research  scientists  and  managers  from  ARI  as  well  as  contract  research 

|  organizations.  In  1981  ARI  in-house  scientists  set  to  work  developing  the 

basic  research  requirements  for  the  effort,  and  specified  those  for  con¬ 
tract  research  In  a  Statement  of  Work  for  "Projects  A  and  B".  These 
i  requirements  and  specifications  were  developed  by  many  scientists  at  ARI, 

including  Joyce  Shields,  Larry  Hanser,  Frances  Grafton,  Hilda  Wing,  Joseph 
Zeldner,  Newell  Kent  Eaton,  Neil  Dumas,  and  John  Mel  linger.  Prior  to  pro¬ 
ject  resourcing  and  contract  award  on  September  30,  1982,  the  research  pro¬ 
gram  was  coordinated  extensively  with  the  Departments  of  Defense,  Air 
Force,  and  Navy,  and  with  the  academic  and  scientific  community.  Three 
papers  were  developed  to  present  the  research  needs,  concepts,  and 
strategies.  These  were  prepared  for  the  Defense  Department,  the  Joint 

Services,  and  the  Defense  Advisory  Committee  on  Military  Personnel  Testing 
(Eaton,  Wing,  Hanser,  Dumas,  and  Shields,  November,  1981),  the  U.S.  Army 
Policy  Council  (Shields,  February,  1982),  the  American  Psychological 
Association  1982  Annual  Meeting  (Eaton  and  Shields,  August,  1982). 

The  primary  goal  of  the  project  is  the  empirical  demonstration  of  the 
relationship  of  predictor  tests  to  training  and  performance.  To  achieve 
this  goal,  the  most  efficient  and  least  disruptive  procedure  for  assessing 
soldier  performance  will  be  developed.  Performance  measures  for  soldiers 


across  all  occupations,  as  well  as  occupation-specific  measures,  will  be 
developed  and  tested.  A  longitudinal  research  data  base  will  be  created, 
and  contain  biographic  data,  aptitude/achievement  Indices,  training  and 
soldier  performance  measures,  and  personnel  ratings  and  actions  for  a  large 
sample  of  soldiers  over  a  number  of  years.  This  data  base  will  provide  the 
foundation  for  a  system  of  empirically  based  quantitative  relationships, 
predicting  future  performance  from  past  performance  weighted  according  to 
actual  predictive  ability  for  a  given  time  in  the  career  of  an  Individual. 
For  example,  Initial  entry  biographical  and  aptitude/achievement  data  may 
be  useful  for  early  training  decisions,  while  for  mid-career  decisions 
training  and  initial  job  performance  will  be  weighted  more  heavily.  Actual 
weights  will  be  determined  empirically. 

A  second,  more  ambitious  goal  Is  to  optimize  the  match  between  applicants 
and  occupations.  This  effort  Incorporates  Army  priorities,  supply  fore¬ 
casts,  and  applicants'  aptitudes,  preferences,  and  predicted  performance 
capabilities.  This  goal  will  be  accomplished  by  mating  soldier  performance 
data  with  an  allocation  system  developed  by  operations  research/computer 
science  professionals  using  linear  and  goal  programming  techniques. 
Individual  and  group  data  provided  in  the  longitudinal  data  base  will  be 
evaluated  along  with  extensive  projections  of  the  characteristics  of  avail¬ 
able  personnel  resources  and  the  Army's  personnel  requirements  based  on  the 
types,  numbers,  and  variety  of  Army  occupations.  From  this,  an  allocation 
system  will  be  developed  to  make  the  best  match  of  individuals  to  occupa¬ 
tions  on  a  near-real  time  basis.  Constraints  include  optimizing  the  system 
from  the  Army's  perspective,  that  is,  filling  critical  occupations  first, 
making  best  use  of  individuals  with  unique  skills  or  abilities,  and 
control  1 1 ng  costs. 

v  1  i  i 


A  consortium,  led  by  the  Human  Resources  Research  Organization  (HumRRQ)  and 
including  the  American  Institutes  for  Research  (AIR),  and  the  Personnel 
Oeclslons  Research  Institute  (PORI)  was  selected  by  ARI  as  the  contract 
organization  offering  the  most  Innovative  and  creative  approaches  to  meet 
the  objectives  of  this  project.  This  research  plan  represents  the 
contributions  and  efforts  of  a  great  many  people,  including  the  ARI 
planners,  Joint  Service  project  evaluators,  ARI  research  scientists  and 
project  monitors,  members  of  the  Scientific  Advisory  Group,  Interservice 
Advisory  Group,  Army  General  Office  Advisory  Group,  and  the  principal  staff 
of  the  consortium  of  contract  research  organizations. 

Deserving  of  specific  mention  for  continued  stimulation,  support,  and 
guidance,  are  Lieutenant  General  Maxwell  R.  Thurman  and  Major  General  H. 
Norman  Schwarzkopf. 

Members  of  the  numerous  groups  who  contributed  their  efforts  In  developing 

this  project  were: 

Joint  Service  Project  Evaluators: 

Dr.  Hilda  Wing,  Or.  Lawrence  Hanser,  Dr.  Bruce  Gould, 

Dr.  Martin  Wiskoff,  Dr.  John  Mellinger,  Dr.  Paul  Rossmeissl, 
and  Dr.  Newell  Kent  Eaton. 

Governance  Advisory  Group  Chairman: 

Major  General  H.  Norman  Schwarzkopf 

Interservice  Advisory  Group  members: 

Or.  G.  Thomas  Sicilia,  Dr.  Joyce  L.  Shields,  Dr.  Martin  Wiskoff, 
and  Lieutenant  Colonel  J.  ?.  Amor. 


Scientific  Advisory  Group  members: 


! 

« 

,  * 

I 

I 

s 

■ 

k 

£ 
1  • 
% 

i 

i 


Or.  Phillip  Bobko,  Or.  Thomas  Cook,  Dr.  Milton  Hakel , 

Or.  Lloyd  Humphreys,  Or.  Robert  Linn,  Dr.  Mary  Tenopyr, 
and  Dr.  J.  E.  Uhlaner. 

Army  General  Officer  Advisory  Group  members: 

Major  General  William  O'Leksy,  Major  General  Maurice  0.  Edmonds 
Brigadier  General  Gary  E.  Luck,  and  Brigadier  General  John  W.  Foss. 


The  staff  of  the  Arn\y  Research  Institute  and  HumRRO  Consortium  responsible 
for  writing,  monitoring,  reviewing,  editing,  and  approving  the  research 
plan,  and  for  the  execution  of  the  research: 


Army  Research  Institute 

HumRRO  Consortium 

Newell  K.  Eaton 

Joyce  L.  Shields 

Marvin  H.  Goer 

John  P.  Campbell 
Robert  Sadacca 

James  H.  Harris 

Paul  G.  Rossmelssl 

Task  1 

Donald  H.  McLaughlin 
Lauress  L.  Wise 
Mlng-Mel  Wang 

Hilda  Wing 

Task  2 

Norman  G.  Peterson 

Rebecca  L.  Oxford-Carpenter 

Task  3 

Robert  Vlneberg 

John  Joyner 

Lawrence  M.  Hanser 

Task  4 

Joseph  A.  Olmstead 
Walter  C.  Borman 
Barry  Riegelhaupt 

Michael  G.  Rumsey 

Task  5 

William  C.  Osborn 
George  Wheaton 

After  months  of  negotiations,  coordination  meetings,  staff  meetings,  advi¬ 
sory  group  meetings,  and  reviews,  this  research  plan  now  represents  a  road¬ 
map  leading  toward  the  accomplishment  of  the  project  goals.  The  recommen¬ 
dations  of  all  planners,  evaluators,  and  advisors  were  considered,  and  most 
were  Incorporated  Into  the  plan  through  the  diligent  and  creative  efforts 
of  the  contractor  staff.  Compromises  were  legion,  based  on  availability  of 
resources,  personnel,  soldiers  to  participate  in  the  research,  travel, 
requisite  development  of  scientific  underpinnings,  and  legitimate  differen¬ 
ces  of  opinion.  Such  compromises  were  negotiated  and  developed  while 
attending  to  the  priorities,  intentions  and  needs  of  all  concerned. 

Troop  support  required  is  another  example  of  compromise.  The  benefits  of 
the  project  were  carefully  weighed  against  the  costs  of  the  troop  time 
required.  Four  tasks  required  data  collection:  Predictor  measures,  train¬ 
ing  criteria.  Army-wide  criteria,  and  MOS-specIflc  criteria  development. 
For  each  of  these,  4-5  sets  of  data  collection  efforts  involving  troop  sup¬ 
port  were  originally  recommended.  To  more  effectively  utilize  troop  sup¬ 
port,  and  in  part  to  strengthen  the  research  design,  some  data  collection 
efforts  have  been  merged  across  tasks  In  this  plan.  Troop  support  loca¬ 
tions  cited  In  the  plan  are  tentative  recommendations  to  be  coordinated 
with  appropriate  organizations. 

The  plan  is  also  a  compromise  with  time.  No  research  plan  is  ever  complete 
or  unchanging;  this  one  is  no  exception.  It  is  a  snapshot,  representing 
the  best  picture  of  the  project  from  the  perspective  of  Spring,  1983.  It 
is  intended  to  be  changed,  updated,  and  improved  over  the  years  of  the 
project.  Semi-annual  meetings  of  the  three  advisory  groups  will  yielo 

xi 


changing  insights,  strategies,  and  needs  which  will  make  the  project  more 
responsive  to  pressing  operational  Army  requirements  and  scientific 
issues.  Collected  data  will  provide  insight  as  to  which  pathways  are  prov¬ 
ing  fruitful  and  which  should  be  modified  or  terminated.  It  is  our  desire 
that  the  project  continue  to  evolve  over  the  years  through  continued 
healthy  discourse  among  the  Army's  senior  leadership,  representatives  of 
the  DOD  and  Joint  Services,  the  scientific  community,  and  the  ARI  and  con¬ 
tractor  scientists.  Our  alms  are:  to  provide  the  Army  with  a  greatly 
improved,  empirically  based  personnel  system  responsive  to  the  needs  of  the 
service,  while  considering  the  unique  abilities,  interests,  and  desires  of 
Individual  soldiers;  to  complete  this  major  project  using  the  best  techni¬ 
ques  in  applied  personnel  selection  and  classification  research;  and,  to 
substantially  enhance  scientific  knowledge  in  the  area. 


NEWELL  KENT  EATON 

ARI  Principal  Scientist  and  COR 


xi  i 


Reference  Notes: 


Eaton,  N.K.  A  Shields,  J.L.  U.S.  Army  soldier  selection,  classification 
and  utilization  research  program.  Paper  presented  at  the  annual  meeting 
of  the  American  Psychoiogica \  Association,  Washington,  August  1982. 

Eaton,  N.K. ,  Wing,  H.,  Hanser,  L.M.,  Dumas,  N.S.,  &  Shields,  J.S. 

Improving  the  selection,  classification,  and  utilization  of  Army  enlisted 
personnel .  Paper  presented  at  Department  of  Defense  Advisory  Coirmlttee 
on  Military  Personnel  Testing  Meeting,  San  Antonio,  November  1981. 

Shields,  J.L.  The  Army's  personnel  system.  Paper  presented  at  the  Army 
Policy  Council  Meeting,  The  Pentagon,  February  1982. 


RESEARCH  PLAN 


TABLE  OF  CONTENTS 

fi9i 

INTRODUCTION  .  1 

Needed  Improvements  In  the  Army  Selection  and  Classification 

System  .  3 

Project  A:  Major  Tasks  . . . . . .  5 

General  Components  of  the  Research  Plan  .  9 

Validity  Generalization  . 26 

General  Outcomes  .  33 

Table  1:  MOS  Proposed  for  Project  A  .  12 

Table  2:  Characteristics  of  the  Proposed  MOS  Sample  .  13 

Table  3:  Estimated  Requirements  for  Measurement  of  83/84 
Cohort  by  MOS  .  19 

Figure  1:  Summary  of  Data  Collections  and  Samples  .  17 

Figure  2:  The  Overall  Data  Collection  Plan  .  18 

Figure  3:  Percent  of  Troops  Available  for  Performance 
Measurement  Research  In  a  Typical  MOS  .  22 

TASK  1  RESEARCH  PLAN  -  VALIDATION  OF  RELATIONSHIPS  AMONG  PREDICTORS 

AND  PERFORMANCE  MEASURES  .  1-1 

General  Purpose  . . .  1-1 

Background  Issues  and  Rationale  .  1-3 

Specific  Objectives  . . .  1-8 

Overall  Summary  of  the  Procedure  .  1-13 

Procedure  . . .  1-15 

Summary  of  Expected  Outcomes  . . .  1-68 

Table  1-1:  Project  Dates  for  Accomplishing  the  Specific  Objectives 
of  Task  1  .  1-9 

REFERENCES  .  1-73 

TASK  2  RESEARCH  PLAN  -  PRE- INDUCTION  PREDICTION  OF  ARMY  SUCCESS  ....  2-1 

General  Purpose  . . .  2-1 

Background  Issues  and  Rationale  .  2-3 

Specific  Objectives  . . .  2-13 

Overall  Summary  of  the  Procedure  .  2-15 

Procedure  .  2-26 

Summary  of  Expected  Outcomes  from  Task  2  .  2-88 

Figure  2-1:  Timetable  for  Task  2  . . .  2-16 

Figure  2-2:  Relationships  of  Task  2  Subtasks  and  Inputs 
from  Other  Project  A  Tasks  . .  2-17 

REFERENCES  .  2-93 


xv 


TABLE  OF  CONTENTS  (Continued) 


Page 

TASK  3  RESEARCH  PLAN  -  MEASUREMENT  OF  SCHOOL/TRAINING  PERFORMANCE  ..  ST~ 


General  Purpose  of  Task  3  . . .  3-1 

Background  Issues  and  Rationale  .  3-3 

Objectives  . 3-11 

Overall  Summary  of  the  Procedure  .. . 3-13 

Procedures  .  3-19 

Summary  of  Expected  Outcomes  from  Task  3  . . .  3-62 

Figure  3-1:  Task  3  Schedule  .  3-18 

Table  3-1:  SME  and  Test  Subject  Support  Requirements  .  3-16 

Table  3-2:  MOS  with  500  Records  In  ARI  Data  Base  .  3-23 

REFERENCES  .  3-65 


TASK  4  RESEARCH  PLAN  -  MEASUREMENT  OF  ARMY-WIDE  PERFORMANCE  .  4-1 

General  Purpose  of  Task  4  .  4-1 

Background  Issues  and  Rationale  .  4-3 

Specific  Objectives  .  4-11 

Overall  Summary  of  the  Procedure  .  4-12 

Procedure  .  4-17 

Summary  of  Expected  Outcomes  from  Task  4  .  4-72 

Figure  4-1:  Task  4  Milestone  Chart  .  4-16 

Figure  4-2:  Behavior  Summary  Scale  for  Job  of  Navy  Recruiter  ....  4-24 

REFERENCES  .  4-85 


TASK  5  RESEARCH  PLAN  -  MEASUREMENT  OF  MOS-SPECIFIC  PERFORMANCE  .  5-1 

General  Purpose  of  Task  5  . . .  5-1 

Background  Issues  and  Rationale  .  5-2 

Specific  Objectives  . 5-15 

Overall  Summary  of  Procedure  . .  5-16 

Procedure  . 5-20 

Surranary  of  Expected  Outcomes  .  5-85 

Figure  5-1:  Task  5  Schedule .  5-17 

Table  5.2.1:  Soldier  Support  Requirements  for  MOS  A 

Job-Task  and  Behavioral  Analyses  .  5-40 

Table  5.2.2:  Soldier  Support  Requirements  for  MOS  B  Job-Task 

and  Behavioral  Analyses  .  5-41 

Table  5.2.3:  Soldier  Support  Requirements  for  MOS  A‘ 

Job-Task  Analyses  . 5-42 

Table  5.2.4:  Soldier  Support  Requirements  for  MOS  B' 

Job-Task  Analyses  . . 5-42 


TABLE  OF  CONTENTS  (Continued) 


Table  5.3.1:  Soldier  Support  Requirements  for  Developing 

MOS  A  Performance  Measures  . 

Table  5.3.2:  Possible  Equipment  Support  Requirements  for 

Developing  MOS  A  Performance  Measures  . 

Table  5.3.3:  Soldier  Support  Requirements  for  Developing 

MOS  B,  A1  and  B1  Performance  Measures  . 

Table  5.5.1:  Soldier  Support  Requirements  for  MOS  A  Field  Test  .. 

Table  5.5.2:  Soldier  Support  Requirements  for  MOS  B  Field  Test  .. 

Table  5.5.3:  Soldier  Support  Requirements  for  MOS  A' 


Page 


Table  5.5.2: 
Table  5.5.3: 

Field  Test 
Table  5.5.4: 

Field  Test 
Table  5.7.1: 
Table  5.7.2: 
Table  5.7.3: 


Soldier  Support  Requirements  for  MOS  B' 


Soldier  Support  Requirements  for  Cohort  Test  I  . 

Soldier  Support  Requirements  for  Cohort  Test  II 

Soldier  Support  Requirements  for  Cohort  Test  III 


REFERENCES  .  5-95 


INTRODUCTION 


The  overall  purpose  of  the  research  projects  for  Improving  the  Selection, 
Classification,  and  Utilization  of  Artty  Enlisted  Personnel  Is  to  enhance 
the  Army's  ability  to  accomplish  Its  peacetime  and  mobilization  missions 
tnrough  Improved  matching  of  Individuals  to  military  occupational 
specialties.  Toward  this  goal,  the  Army  Selection  and  Classification 
measures  -  Development  and  Validation  Project  (Project  A)  Is  devoted  to  the 
development  of  an  expanded  and  comprehensive  selectlon/classlf Icatlon  test 
battery  and  the  validation  of  that  test  battery  against  a  full  array  of 
existing  and  newly  developed  criteria.  Speclfcally,  Project  A  Is  to: 

o  validate  existing  selection  measures  against  both 
existing  and  project -developed  criteria,  the  latter 
to  Include  both  Army-wide  performance  measures  based 
on  newly  developed  rating  scales  and  direct  measures 
of  MOS  specific  task  performance; 


develop 


validate 


and/or  Improved  selection 


and  classification  measures; 


o  validate  proximal  criteria,  such  as  performance  In 
training,  as  predictors  of  later  criteria,  such  as 


job  performance  ratings,  so  that  more  Informed  re¬ 
assignment  and  promotion  decisions  can  be  made 
throughout  the  Individual's  tour; 

o  determine  the  relative  utility  to  the  Army  of  differ¬ 
ent  performance  levels  across  MOS ;  and 

o  estimate  the  relative  effectiveness  of  alternative 
selection  and  classification  procedures  In  terms  of 
their  validity  and  utility  for  making  operational 
selection  and  classification  decisions. 

Project  A  Is  criterion-driven.  Its  coherence  derives  from  the  fact  that 
all  of  Its  substantive  tasks  focus  ~/n  a  single  domain,  which  we  can  label 
“effective  performance  in  the  Army."  Project  A  must  define  (state  the 
dimensions  or  components  of)  that  domain,  measure  (develop  operational 
criteria  for)  that  domain,  and  predict  (specify  the  prior  Information  rele¬ 
vant  to)  that  domain.  All  of  the  activities  of  Tasks  1  through  5  must  be 
driven  by,  and  made  comprehensible  ■‘n  terms  of,  the  performance  components 
which  constitute  the  domain  of  Interest.  The  project  must  not  be  viewed 
and  must  not  be  conducted  as  a  set  of  separable  tasks  that  make  "Inputs"  to 
one  another  and  that  are  to  be  "Integrated"  somehow.  Such  a  view  misses 
the  essential  unity  of  the  effort;  Project  A  is  one  project. 


NEEOED  IMPROVEMENTS  IN  THE  ARMY  SELECTION  AND  CLASSIFICATION  SYSTEM 


The  current  Am\y  personnel  system  has  a  number  of  deficiencies  which  must 

be  addressed  in  Project  A: 

1.  Predictors  covering  the  full  range  of  the  performance  domain  or 
criteria  space  are  lacking.  Currently,  available  measures  chiefly 
focus  on  cognitive  abilities.  Relevant  non-cognlti ve  measures  such  as 
psychomotor/perceptual  coordination,  vocational  Interest,  and 
biographical  Indices  need  to  be  developed  and  their  usefulness  in 
predicting  aspects  of  Army-wide  and  MOS  specific  performance 
determined. 

2.  Measures  of  job  performance  are  lacking.  Current  measures  of  job 
proficiency  (SQT)  are  designed  primarily  as  diagnostic  training  tools 
rather  than  as  Indicators  of  successful  job  performance. 

3.  The  selection  and  classification  of  Individuals  are  based  on  the 
relationship  of  entrance  tests  to  performance  In  training,  not 
performance  on  the  job. 

4.  The  Army  does  not  have  the  system  of  data  to  make  critical  personnel 
decisions  throughout  a  soldier's  life-cycle  based  on  Individual  job 
performance  and  the  needs  and  priorities  of  the  Army. 


5.  Currently,  If  an  applicant  chooses  a  specific  training  program  and 
meets  the  minimum  aptitude  requirements,  he  or  she  Is  placed  into  that 
training  If  an  opening  exists.  This  procedure  does  not  take  Into 
account  where  that  Individual  could  best  serve  the  needs  of  the  Army  or 
even  where  that  Individual  could  be  most  successful  in  the  Army. 

6.  The  Army  does  not  have  efficient  means  of  t  ‘esslng  needs  and  policies 
in  terms  of  personnel  goals,  constraints  and  trade-offs.  A  dynamic, 
adaptive,  self-adjusting  system  that  supports  Army  management  decision¬ 
making  Is  required. 

The  reasons  for  these  deficiencies  stem  chiefly  from  the  dynamics  In  the 
labor  market,  the  new  requirements  produced  by  emerging  weapon  systems,  and 
the  Inevitable  lag  of  an  operational  system  behind  the  most  recent  techno¬ 
logical  advances  In  testing  and  personnel  decision  making. 


4 


PROJECT  A:  MAJOR  TASKS 


Project  A  Is  organized  Into  five  major  research  tasks: 

Task  1.  Validation 

Task  1  has  two  major  components.  The  first  Is  to  maintain  the  data  base 
and  provide  the  analytic  procedures  to  determine  the  degree  to  which 
performance  In  Army  jobs  Is  predictable  from  some  combination  of  new  or 
existing  measures.  The  second  Is  to  determine  whether  the  existing  set  of 
predictors,  new  predictors,  or  some  combination  of  new  and  existing  predic¬ 
tors  has  utility  over  and  above  the  present  system.  These  two  components 
must  be  accomplished  In  light  of  state-of-the-art  technology  In  personnel 
selection  research. 

This  task  encompasses  the  Integrated  analysis  of  all  data  generated  through 
research  activities  In  the  ether  tasks.  While  separate  teams  will  be 
collecting  and  analyzing  subsets  of  data  In  accomplishing  their  tasks, 
personnel  working  on  this  task  will  be  analyzing  combined  data  files.  A 
longitudinal  Research  Data  Base  shall  be  developed  and  maintained  as  part 
of  Task  1. 

Task  2.  Prediction  of  Job  Performance 

To  date,  a  large  proportion  of  the  efforts  of  the  armed  services  in  this 
area  have  been  concentrated  on  Improving  the  ASVAB.  The  ASVAB,  and  other 
existing  test  batteries,  are  primarily  Indicators  of  skills  that  require 
cognitive  abilities.  However,  many  critical  Army  tasks  appear  to  require 


5 


psychomotor  and  perceptual  skills  for  their  successful  performance.  It  Is 
perhaps  In  such  non-cognltlve  domains  that  the  greatest  potential  for 
adding  valid  Independent  dimensions  to  current  classification  Instruments 
are  to  be  found. 

The  research  plan  Includes  Identifying,  reviewing  and  evaluating  Instru¬ 
ments  and  variables  which  may  be  used  at  enlistment  for  predicting  Army 
success. 

A  critical  aspect  to  this  task  Is  the  demonstration  of  the  incremental 
validity  added  by  new  predictors.  While  It  may  be  necessary  to  rely  on 
content  and  construct  methods  of  validity  In  the  development  phase  of  new 
predictors,  the  Army  reoulres  criterion-related  flelu  research  as  support 
for  generalization  and  extension  on  findings.  Prior  to  acceptance  and  use 
of  any  new  predictors  there  will  be  a  clear  demonstration  that  they  add 
validity  beyond  that  provided  by  current  predictors  and  that  the  cost  of 
test  administration,  scoring,  etc..  Is  justified  by  the  value  of  the  Infor¬ 
mation  provided. 

Task  3.  Measurement  of  School /Training  Success 

The  objective  of  Task  3  Is  to  derive  school  and  training  performance 
indices  that  can  be  used  as  1}  criteria  against  which  to  validate  the 
Initial  predictors,  and  2)  predictors  of  later  job  performance.  Insofar 
as  possible,  these  measures  will  provide  information  regarding  relative 
standing  of  trainees  both  within  and  across  training  programs. 


6 


The  general  scope  of  this  task  Is  to  evaluate  currently  available  measures 
and  If  necessary,  to  revise  them  or  develop  new  measures.  Comprehensive 
job  knowledge  tests  will  be  developed  for  the  sample  of  MOS  investigated 
and  their  content  and  construct  validity  will  be  determined. 

Task  4.  Assessment  of  Anay-wlde  Performance 

In  contrast  to  performance  measures  which  may  be  developed  for  a  specific 
Army  MOS,  Task  4  will  develop  measures  that  can  be  used  across  all  MOS 
(l.e.,  Arn\y-w1de).  That  Is,  the  Intent  Is  to  develop  measures  of  first  and 
second  tour  job  performance  against  which  all  Army  enlisted  personnel  may 
be  measured.  What  Is  being  measured  might  be  termed  "soldiering."  In 
fact,  a  major  objective  for  Task  4  Is  to  develop  a  model  of  soldier 
effectiveness  that  specifies  the  major  dimensions  of  an  Individual's  con¬ 
tribution  to  the  Army  as  an  organization.  Another  Important  objective  of 
Task  4  Is  to  develop  measures  of  utility  that  can  be  used  to  scale  various 
performance  levels  across  different  MOS. 

Task  5.  Develop  MOS  Specific  Performance  Measures 

The  focus  of  Task  5  Is  the  development  of  reliable  and  valid  measures  of 
specific  job  task  performance  for  9  selected  MOS  (out  of  a  sample  of  19 
MOS).  This  task  may  be  thought  of  as  consisting  of  three  major  components: 
job  analysis,  construction  of  job  performance  measures,  and  construct 
validation  of  the  new  measures. 

While  only  19  MOS  will  be  analyzed  during  this  project,  the  Army  may  in  the 
future  wish  to  develop  job  performance  measures  for  a  larger  number  of  MOS. 


7 


’rr^gygyyT--  r\ 


For  this  reason,  the  methods  are  intended  to  be  general Izable  to  all  Arn\y 
MOS.  Also,  the  Army  must  be  able,  using  its  existing  personnel  and 
resources,  to  carry  out  the  developed  techniques  on  a  regular  recurring 
basis.  Finally,  the  analyses  must  provide  for  the  establishment  of  perfor¬ 
mance  standards  at  both  minimum  and  higher  levels. 


8 


GENERAL  COMPONENTS  OF  THE  RESEARCH  PLAN 


The  specific  Issues  and  procedural  steps  comprising  this  research  effort 
are  detailed  In  the  sections  corresponding  to  Tasks  1-5.  However,  the  full 
project  Is  organized  around  a  number  of  major  data  collection  efforts  that 
provide  Information  for  several  of  the  tasks  simultaneously.  They  will  be 
large  and  expensive,  but  they  are  fundamental  to  the  success  of  the  pro¬ 
ject.  Without  them  the  Army  cannot  realize  Its  goals  for  developing  a 
comprehensive  selection  and  classification  system  that  is  statistically, 
psychometrlcal ly,  and  operationally  sound.  Consequently,  before  proceeding 
to  the  more  detailed  research  plans,  these  major  procedural  components  will 
be  described. 

Sampling  Considerations 

There  are  two  sampling  considerations.  First,  we  shall  select  a  sample  of 
MOS  from  the  universe  of  possible  MOS;  then  we  shall  obtain  samples  of 
enlisted  personnel  (EP)  within  each  MOS.  The  MOS  are  the  primary  sampling 
units.  Large  and  representative  EP  samples  are  Important  mainly  to  the 
extent  that  they  enhance  the  stability  of  the  statistical  results  obtained 
for  the  sample  MOS. 

There  Is  a  trade-off  In  the  allocation  of  project  resources  between  the 
number  of  MOS  researched  and  the  number  of  subjects  tracked  within  each 
MOS:  the  more  MOS  investigated,  the  fewer  subjects  per  MOS  can  be  tested 
and  vice  versa. 


9 


•  .Tji  r  .-3  v. 


We  propose  to  collect  new  data  for  a  sample  of  19  MOS.  To  samples  from  all 
|  19,  we  will  administer  the  new  predictors  (from  Task  2)  and  collect  the 

school  and  Army -wide  performance  data  (of  Tasks  3  and  4).  To  9  of  these 
MOS,  we  will  also  administer  the  MOS-specIflc  performance  measures 
j  developed  In  Task  5.  The  9  MOS  will  be  chosen  to  provide  maximum  coverage 

of  the  total  array  of  knowledge,  ability,  and  skill  requirements  of  Army 
jobs,  given  certain  statistical  constraints. 

I 

MOS  Selection 

Unfortunately,  painstaking  examination  since  the  start  of  the  contract  by 
I  Task  5  personnel  has  not  provided  sufficient  data  to  permit  a  confident 

judgment  that  any  particular  sample  of  MOS  Is  representative  of  the  popula¬ 
tion  of  MOS  for  which  personnel  decisions  must  be  made.  To  support  such  a 
I  judgment,  one  would  need  job  analysis  Information  on  the  similarities  In 

job  requirements  and/or  job  tasks  across  MOS  such  that  MOS  could  be  clus¬ 
tered  Into  maximally  homogeneous  subgroups.  Since  such  data  do  not  exist 
I  In  any  systematic  form,  a  first  sample  of  MOS  has  been  drawn  by  using  the 

following  considerations: 

I  1)  High  density  MOS  that  would  provide  sufficient  sample 

sizes  for  statistically  reliable  estimates  of  new 
predictor  validity  and  differential  validity  across 
racial  and  gender  groups. 

2)  Representative  coverage  of  the  aptitude  areas  measured  by 

I  the  ASVAB  area  composites. 

3)  High  priority  MOS  (as  rated  by  the  Army  In  the  event  of  a 
national  emergency). 

4)  Representation  of  the  Army's  designated  Career  Management 

j  Fields  (CMF). 


10 


r 


5)  Adequate  representation  of  the  types  of  jobs  required  to 
accomplish  the  Army's  mission. 

A  much  more  complete  specification  of  the  procedure  that  was  used  Is  given 
In  the  description  of  Task  5.  However,  a  summary  list  of  the  MOS  that  were 
selected  Is  given  in  Table  1.  Summary  characteristics  of  the  proposed 
sample  are  given  In  Table  2. 

From  this  list,  4  MOS  were  Identified  by  ARI  and  Project  A  staff  and  the 
Project  A  Governance  Advisory  Group  as  encompassing  a  wide  range  of  job 
characteristics  and  as  being  unlikely  to  be  eliminated  as  the  result  of 
gathering  further  data.  These  MOS  are  Indicated  in  Table  1  by  an 
asterlck.  They  constitute  the  MOS  from  which  the  FY83/84  longitudinal 
sample  will  be  drawn.  Work  has  already  begun  on  the  development  of 
performance  measures  for  these  MOS. 

Again,  within  the  limits  of  currently  available  information  this  array  of 
MOS  represents,  as  best  It  can,  a)  the  full  population  of  MCS  for  which  new 
classification  measures  would  be  used,  b)  the  range  of  aptitudes  currently 
used  to  make  selection  decisions,  c)  high  priority  MOS,  d)  MOS  that  are 
projected  to  Increase  In  density,  and  e)  MOS  that  contain  enough  people  to 
permit  stable  estimates  of  alternative  prediction  equations  and  differ¬ 
ential  validities  across  racial  and  gender  groups  and  across  MOS. 

Without  sufficient  precision  in  the  statistical  estimates  all  other  ques¬ 
tions  cannot  be  answered.  It  Is  particularly  crucial  that  questions  of 
racial  and  gender  fairness  be  thoroughly  explored  at  the  outset. 


11 


ill  inmMHiMHHiii 

j  *3  ** 

Il_  1 1 1 n  s s 1 1 1 n  1 1 1 $ « » s 

!  I  !  1 1  i  S  s  1 1  i  S  §  3  s  s  g  i  s  s  s 

|  s*5SSS!SSI?S«'i#ss*ss 

X 

a  - 

I  |  l  I  §  S  g  I  i  3  8  I  I  g  g  a  §  *  *  8  * 


s  SI5g'5'5*£S 


18*1 


i  mmmmi>s»ss” 


s  1 


1 


i  i  m  >(  >*  I  >*  1 1  ;  >’  j  s  I  in 


5  3  Slodife8fe8do8oodftiouJ 


S83S2SJSJ  n  2SSU5SR 


1  i  |l?li 


'  l  l  II  I  I 


a  * 


oo9oja}@s3am>ma)om*u(*®w 

S(33^2»2*?ieI52Sies§s!5 


•  •  • 


Tabla  2 


CHARACTERISTICS  OF  THE 
PROPOSED  MOS  SAMPLE 


FY81  ACCESSIONS 

PERCENT 

OF  TOTAL 

MOS 

SAMPLE 

PERCENT 

OF  TOTAL 

TOTAL 

133,192 

— 

68,591* 

— 

FEMALE 

19,757 

14.8 

8,609 

14.7 

BLACK 

36,034 

27.0 

16,001 

27.3 

HISPANIC 

6,416 

4.8 

2.758 

4.7 

•SAMPLE  -  44%  OF  TOTAL  ACCESSIONS 
REGULAR  ARMY  ONLY 


I 

i 


13 


vr; r-  -»  r,~:t '  J3*~ 


Consequently,  the  selection  of  the  first  sample  of  M0$  began  with  a 
consideration  of  the  number  of  people  In  each  MOS  for  a  particular  cohort 
and  then  tried  to  maximize  coverage  of  CMF  and  aptitude  areas. 

However,  besides  being  statistically  reliable,  the  estimates  of  selection 
and  classification  equations  based  on  data  from  the  19  MOS  must  also  be 
evaluated  In  terms  of  how  appropriately  they  can  be  used  to  make  selection 
and  classification  decisions  for  MOS  not  among  the  19  to  be  researched. 
This  Is  the  classic  problem  of  validity  generalization.  That  Is,  given 
empirical  validation  data  for  some  specific  set  of  jobs  (MOS  in  our  case), 
to  what  extent  can  these  data  be  generalized  to  estimate  the  validity  of 
the  selection  measures  for  jobs  (MOS)  that  have  not  been  analyzed?  It  Is  a 
fact  that  there  are  over  250  enlisted  MOS  In  the  Army  for  which  selection 
and  classification  decisions  must  be  made.  It  Is  also  a  fact  that  Project 

A  can  empirically  validate  new  selection  and  classification  measures  In 

only  a  small  subset  of  the  total  number  of  MOS.  There  Is  no  perfect  way  to 
select  the  perfect  set  of  MOS  so  as  to  precisely  maximize  the  degree  of 
validity  generalization.  The  problem  must  be  approached  by  multiple 

methods  over  the  course  of  the  Project.  The  methods  to  be  used  will  be 

described  below,  after  the  general  nature  of  the  data  collection  has  been 
described. 

The  FY8I/82  Cohort 

In  addition  to  collecting  data  from  new  samples,  the  project  will  make  use 
of  existing  file  data  that  have  been,  or  can  be,  accumulated  for  1981  and 
1982  accessions.  The  editing  and  merging  of  data  from  the  accessions  and 


14 


EMF  files  for  entry  Into  the  Longitudinal  Research  Data  Base  (LRDB)  Is 
already  well  along  and  will  be  ready  for  analyses  beginning  In  late  March 
or  early  April,  1983.  The  overall  objective  Is  to  accumulate  as  much  data 
as  possible  on  available  predictors  and  available  criteria.  Henceforth 
this  source  of  data  will  be  known  as  the  FY81/82  cohort. 

There  are  several  factors  that  argue  for  an  extensive  analysis  of  the 
available  file  data  for  the  FY81/82  cohort: 

o  These  are  the  best  data  currently  available  for 

evaluating  the  validity  of  the  current  form  of  the 
ASVAB  (8,  9,  10).  Therefore,  there  are  a  number  of 
basic  validation  questions  for  which  the  EMF  and 

accessions  file  should  be  useful  (e.g..  How  does  the 

validity  of  the  existing  area  aptitude  scores  compare 
to  alternative  composites  derived  from  the  ASVAB 
subtests?) . 

o  If  training,  EER,  SQT,  or  other  archival  data  are 

available  In  sufficient  quantity  and  quality  to 
constitute  usable  criteria,  then  the  file  data  can  be 
used  as  a  benchmark  against  which  to  compare  the 
Incremental  validity  generated  by  Project  A.  That 
Is,  for  the  current  predictors  and  the  available 
criteria,  such  test  validities,  composite  validities, 
differential  validity  across  groups  (e.g.,  race)  and 


different  validity  across  MOS  (l.e.,  validity 
generallzeablllty)  can  be  determined.  The  question 
is  then  how  much  these  indices  change  when  the  new 
experimental  battery  is  tried  out  with  the  broader 
range  of  criteria. 

o  Analysis  of  the  FY81/82  cohort  will  allow  us  to  try 
out  a  number  of  new  analytic  techniques  so  as  to 
determine  If  they  will  be  useful  In  later  phases  of 
the  project.  For  example,  simultaneous  estimation 
techniques  could  be  used  to  determine  how  many 
significantly  different  regression  equations  are 
needed  to  predict  criterion  scores  In  different  MOS. 

Also,  empirical  Bayesian  techniques  could  be  used  to 
estimate  the  common  regression  line  across  MOS  or 
across  cohorts. 

Collection  of  New  Data  Within  MOS 

There  will  be  five  major  new  data  collections  Involving  three  major 
samples.  These  furnish  much  of  the  information  to  be  used  to  answer  the 
specific  questions  posed  In  the  following  sections.  The  sample  composition 
designates  subjects  by  federal  fiscal  year  of  entry  Into  the  Army.  The 
schedule  and  types  of  data  collected  are  for  each  sample  are  shown  as 
Figure  1. 


16 


Figure  1 

Summary  of  Data  Collections  and  Samples 


Data  Collection 

Sampla 

FY  83/84 
(Longitudinal) 

FY  33/84 
(Concurrent) 

FY  86/87 
(Longitudinal) 

(1) 

10/1/83  -  6/30/84 

Preliminary 

Battery 

(21 

6/1/85  -  9/30/85 

Trial  Battery 

Trial  Battery 

1st  Tour 

1st  Tour 

Criteria 

Criteria 

(3) 

3/1/86  •  2/28-87 

Experimental 

Battery 

(4> 

6/1/88  •  9/30/88 

2nd  Tour 

2nd  Tour 

1st  Tour 

Criteria 

Criteria 

Criteria 

(5) 

2/1/91  •  3/31/91 

2nd  Tour 

Criteria 

Sample  sizes  and  use  of  subjects  discussed  In  the  following  sections  and  In 
the  Individual  task  descriptions  reflect  a  standardized  approach,  rather 
than  differentiating  sample  sizes  by  MOS  In  detail  every  time  a  data 
collection  Is  described.  This  Is  done  In  the  Interest  of  clarity  and 
economy  of  discussion.  Specific  sample  requirements,  by  MOS,  by  utiliza¬ 
tion  will  be  detailed  In  each  Troop  Support  Request  and  are  currently 
estimated  for  the  FY83/84  cohort  In  Table  3  In  this  Introduction. 

A  schematic  of  the  data  collection  plan  is  shown  as  Figure  2. 

Data  Collection  1 

This  first  major  data  collection  follows  a  longitudinal  design.  New 
recruits  will  be  tested  with  a  preliminary  predictor  battery,  developed  in 
Task  2,  beginning  In  the  late  summer  or  early  fall  of  1983  and  continuing 
until  the  summer  of  1984.  The  recruits  will  be  sampled  from  4  MOS 
(05C,  19E/K,  63B,  71L).  The  principal  criterion  data  will  be  training 


17 


ESTIMATED  REQUIREMENTS  FO 
MEASUREMENT  OF  83/84  COHORT  B1 


school  achievement  measures  (developed  in  Task  3)  administered  as  enlistees 
pass  through  their  training  courses.  However,  the  criterion  administration 
sites  for  the  FY83/84  concurrent  sample  will  later  be  chosen  so  as  to  maxi¬ 
mize  the  probability  that  an  Individual  in  the  FY83/84  longitudinal  sample 
will  fall  Into  the  FY83/84  cohort  (concurrent  sample),  first  tour  sample, 
which  would  result  In  additional  criterion  measures  being  available.  Since 
the  data  collection  will  constitute  a  major  test  of  whether  previously 
developed  predictors  from  major  domains  not  covered  by  ASVAB  can  add  to  the 
prediction  of  training  school  grades  and  other  available  criteria,  a  large 
number  of  cases  will  be  needed  (see  Table  3). 

Data  Collection  2 

The  collection  of  data  on  new  predictors,  job  knowledge  tests,  and  the 
Army-wide  and  MOS-specIflc  performance  measures  will  be  accomplished  In  a 
large  field  administration  of  these  Instruments  on  the  FY83/84  cohort-first 
tour  during  6/85-10/85.  The  target  will  be  to  collect  data  on  the  new  pre¬ 
dictors,  job  knowledge  tests,  and  Army-wide  performance  measures  for  an 
average  of  500  enlisted  personnel  (EP)  In  each  of  the  19  MOS  Identified 
earlier;  and  to  collect  data  on  MOS-specIflc  measures  for  the  EP  In  the 
9  MOS  of  this  group  for  which  hands-on  Instruments  will  be  constructed 
Initially.  These  data  would  be  used  along  with  the  existing  preinduction 
test  scores,  school  grades  and  behavioral  Indices  already  In  the  cohort 
data  base  to  validate  the  ASVAB  and  other  existing  measures,  conduct  a  con¬ 
current  validation  of  the  new  predictors  and  proximal  criteria,  improve  the 
psychometric  quality  of  the  new  Instruments,  help  guide  further  instrument 
development,  and  select  the  most  promising  new  predictors  for  adminis¬ 
tration  to  the  FY86/87  cohort. 


20 


Data  Collection  3 

A  longitudinal  prediction  sample  will  be  collected  from  the  FY86/87  cohort 
by  testing  recruits  with  the  revised  predictor  battery  and  obtaining  school 
data  beginning  In  March  of  1986  and  continuing  until  February  1987. 
Recruits  will  be  sampled  from  the  19  focal  MOS.  (Data  may  be  collected 
from  additional  MOS  in  order  to  allow  better  validity  generalization  from 
the  sample  to  the  population  of  MOS.)  Since  this  sample  will  be  followed 
up  for  purpose  of  collecting  criterion  Information  once  during  1988  (first 
tour)  and  again  during  1991  (second  tour)  the  expected  attrition  In  the 
sample  will  be  considerable.  The  expected  attrition  for  a  typical  MOS  Is 
shown  In  Figure  3.  This  dictates  that  It  Is  highly  desirable  that  about 
2200  recruits  be  tested  from  each  MOS  on  the  average.  There  will  most 
likely  not  be  that  many  accessions  per  year  for  all  MOS.  In  MOS  with  fewer 
accessions,  we  need  to  obtain  as  many  of  the  available  recruits  as 
possible. 

Data  Collection  4 

During  the  period  June,  1988  through  September  1988,  Army-wide  and  MOS- 
speciflc  performance  measures  will  be  collected  at  12  to  15  sites  from  the 
FY83/84  cohort  which  will  be  in  Its  second  tour  and  the  FY86/87  cohort 
which  will  be  In  Its  first  tour. 

Data  Collection  5 

From  January  1991  to  March,  1991  Army-wide  and  MOS-specIfic  criterion  data 
will  be  obtained  from  the  FY86/87  cohort  which  will  be  in  its  second  tour. 


21 


PERCENT  OF  TROOPS  AVAILABLE  FOR 
PERFORMANCE  MEASUREMENT  RESEARCH  IN  A 


The  magnitude  of  the  above  data  collection  may  seem  large.  However,  It 
Is  dictated  by  the  following  considerations: 


o  The  overriding  goal  Is  to  develop  a  comprehensive 
selection  and  classification  system  that  will  be 
Implemented  across  all  non-classlfied  enlisted  MOS 
that  are  associated  with  advanced  Instructional 
training.  Consequently,  the  different  parts  of  the 
system  cannot  be  studied  piecemeal.  If  the  system 
and  connections  are  not  studied  as  a  whole,  It  will 
not  be  possible  to  develop  the  optimal  set  of  pre¬ 
induction  tests,  performance  measures,  and  algorithms 
that  link  the  parts.  We  must  have  a  large  amount  of 
Information  on  each  person  and  this  means  that  sample 
sizes  must  be  large  to  Insure  statistical  relia¬ 
bility. 

o  It  Is  necessary  to  examine  the  differences  in 
regressions,  correlations,  and  other  statistical 
Indices  between  gender  groups,  racial  groups,  MOS, 
etc.  As  has  been  frequently  demonstrated,  testing 
differences  between  regression  and/or  correlation 
coefficients  requires  very  large  sample  sizes. 

o  It  Is  necessary,  for  Implementation  of  the  selection 
and  classification  system,  to  draw  conclusions  about 

23 


the  level  of  validity  for  each  MOS.  Thus,  each  MOS 
that  Is  Included  must  have  a  sufficient  sample  size 
to  make  reliable  statistical  conclusions.  Since  the 
Arn\y  is  a  large  organization,  the  number  of  MOS  that 
are  researched  must  be  representative  of  the  full 
range  of  jobs. 


There  Is  considerable  attrition  from  the  sample  as  the  cohort  moves  through 
Its  tour.  The  attrition  can  be  summarized  by  the  following  points. 


1)  A  certain  percentage  of  recruits  who  begin  AIT  will 
not  finish.  Attrition  during  training  Is  not  random, 
either  by  MOS  or  by  ability  level  within  MOS. 

2)  Of  those  who  finish  their  AIT,  a  certain  percentage 
will  attrlte  during  the  first  1-2  years  of  their 
tour. 

3)  Since,  for  purposes  of  this  project,  the  criterion 
assessment  of  people  must  take  place  on  a  relatively 
small  number  of  Installations,  not  all  the  sample 
will  be  found  on  those  bases  (some  will  be  scattered 
across  a  much  larger  number)  and  a  further  reduction 
In  the  sample  will  occur. 

4)  It  Is  also  true  that  during  a  given  time  period,  at  a 
given  base,  not  all  of  the  people  In  the  sample  will 
actually  be  available  for  testing  (e.g.,  due  to 
leaves,  Illness,  etc.)  and  additional  shrinkage  In 
the  sample  will  occur. 

5)  Only  a  small  proportion  of  the  original  sample  will 
re-enllst  and  be  available  for  the  second  tour 
measures. 

6)  Of  those  who  re-enllst,  only  a  certain  percentage 
will  be  on  the  bases  where  the  testing  Is  taking 
place  at  any  designated  time  and  be  available  for 
testing. 

7)  The  attrition  rates  over  the  various  stages  In  a 
soldier's  tour,  from  AIT  to  reenl Istment  are  not  the 
same  for  all  MOS.  In  fact,  they  vary  a  great  deal, 
which  makes  the  process  of  sample  selection  diffi¬ 
cult. 


24 


rg.'arrar; 


Estimates  on  attrition  and  sample  shrinkage  for  the  MOS  listed  In  Table  1 
are  shown  In  Table  3  and  Figure  2.  The  estimates  are  based  on  actual 
figures  for  previous  or  current  accessions.  As  such,  they  constitute  our 
best  estimate  for  how  these  decay  functions  will  look  In  the  future.  The 
Initial  samples  that  are  required  can  then  be  generated  by  working 
backwards  from  the  sample  sizes  that  are  necessary  to  provide  a  minimum 
level  of  statistical  reliability  at  the  crucial  data  collection  points. 
The  specific  sample  sizes  for  each  MOS  for  each  major  data  collection  were 
generated  In  this  way. 


In  sum,  the  aim  of  the  project  Is  to  develop  an  organization-wide  system 
for  a  very  Important  function.  In  an  organization  as  large  and  as  varied 
as  the  Army  there  Is  no  way  that  can  be  done  on  a  small  scale. 


i-^5. iT £ 


VALIDITY  GENERALIZATION 


Before  the  Computerized  Allocation  System  (CPAS)  can  become  operational, 
the  appropriate  parameters  In  the  selection  and  classification  model  must 
be  estimated  for  each  MOS  In  the  system.  The  parameters  of  Interest  are 
the  choice  of  specific  tests  and  the  relative  weight  for  each  test  that 
will  be  used  to  obtain  a  predicted  performance  score  In  a  particular  MOS 
for  a  particular  Individual  recruit.  As  noted  previously,  this  will  In¬ 
volve  parameter  estimates  for  over  250  MOS.  However,  Project  A  can  collect 
empirical  validation  data  on  only  19  MOS.  How  then  can  the  empirical  esti¬ 
mates  for  19  MOS  be  generalized  to  250+  MOS? 

This  Is  not  a  problem  unique  to  the  Army.  It  arises  anytime  that  an  organ¬ 
ization  seeks  to  use  a  selection  test  or  prediction  equation  beyond  the 
specific  kind  of  job  or  situation  for  which  It  was  validated.  Since  In  any 
complex  organization  It  Is  virtually  always  too  expensive  and  seldom  feas¬ 
ible  to  validate  selection  measures  for  every  situation  In  which  a  decision 
must  be  made,  the  problem  arises  with  considerable  frequency. 


There  Is  now  a  growing  literature  on  validity  generalization  and  it  is  ap¬ 
parent  that,  for  cognitive  ability  tests  at  least,  validities  are  much  more 
general izable  across  jobs  and  situations  than  previously  thought.  However, 
there  Is  no  simple  or  universally  accepted  method  by  which  the  parameters 
of  the  prediction  model  can  be  estimated,  £  priori .  ir;  new  situations.  It 
is  seldom  possible  to  use  classical  statistical  Inference  In  a  straight¬ 
forward  manner.  Consequently,  to  use  validity  generalization  operationally 
one  must  somehow  use  multiple  methods  to  establish  the  similarity  of  job 


26 


tasks  and  job  requirements  across  situations  such  that  the  validation  data 
(e.g.,  the  multiple  R  of  tests  A  and  B  with  criterion  Y)  acquired  in  one 
setting  (e.g.,  MOS)  can  be  used  to  make  decisions  in  other  settings  judged 
to  be  similar. 

NOS  Clustering 

The  sample  of  MOS  for  the  FY83/84  cohort  will  be  based  on  the  considera¬ 
tions  previously  described.  Questions  of  statistical  reliability  for  esti¬ 
mates  of  Individual  test  validity  and  differential  validity  across  racial 
and  gender  groups  are  paramount  as  well  as  questions  concerning  how  well 
the  19  selected  MOS  represent  the  larger  population  of  MOS  In  terms  of 
tasks  performed  and  skills  and  abilities  required. 

At  the  present  time  there  are  not  available  sufficient  job  analysis  data  to 
permit  a  formalized  clustering  (via  a  cluster  analysis  or  factor  analysis 
method)  of  MOS  according  to  their  relative  similarity  of  task  content  or 
job  requirements.  Such  a  data  base  will  be  built  before  the  experimental 
predictor  battery  Is  administered  to  the  FY86/87  cohort.  By  that  time  the 
population  of  Army  MOS  will  be  clustered  Into  homogeneous  subgroups  such 
that  the  array  of  19  MOS  for  the  FY83/84  cohort  (listed  In  Table  1)  can  be 
evaluated  In  terms  of  how  well  It  represents  the  cluster  structure  for  the 
population.  If  there  are  gaps,  the  sample  of  MOS  can  be  adjusted  so  as  to 
permit  as  much  validity  generalization  as  possible. 


The  specific  steps  to  be  taken  to  obtain  the  cluster  solution  for  the 
FY86/87  cohort  will,  in  part,  be  a  function  of  the  results  of  a  pilot 


project  now  being  conducted  within  Task  5.  In  that  research  expert  judges 
are  being  used  to  cluster  a  sample  of  111  MOS  Into  subgroups  that  are 
homogeneous  In  terms  of  their  judged  similarity  in  task  content.  A 
multi -dimensional  scaling  algorithm  will  then  be  used  to  recover  the 
dimensions  that  seem  to  form  the  basis  of  the  clustering.  Initial  analyses 
of  the  data  being  collected  contributed  to  the  determination  of  the  list  of 
proposed  MOS  given  In  Table  1.  The  results  of  this  research  should  give  an 
Indication  of  the  feasibility  of  asking  judges  to  make  such  judgments,  the 
degree  of  Inter-rater  agreement,  the  number  of  dimensions  It  is  feasible  to 
use,  and  the  level  of  detail  required  for  the  MOS  job  descriptions. 

If  there  are  no  counter  Indications,  then  larger  panels  of  experts  (of  at 
least  two  types:  personnel  professionals  and  army  managers)  will  be  used 
to  cluster  all  MOS,  based  on  similarity  of  job  content.  To  do  this  each 
MOS  will  be  rated  on  a  standardized  set  of  job  content  dimensions  and  job 
requirement  dimensions.  It  is  the  profiles  of  ratings  that  will  be  used  by 
the  clustering  algorithm  to  generate  clusters  of  MOS  that  are  maximally 
homogeneous  within  clusters.  The  MOS  at  the  centroid  nf  each  cluster  would 
be  focal  MOS.  The  more  similar  are  the  MOS  in  a  cluster  the  more  appro¬ 
priate  It  Is  to  use  a  prediction  equation  developed  on  the  focal  MOS 
to  make  selection  and  classification  decisions  for  all  the  MOS  In  the 
cluster.  Also,  by  varying  the  weights  assigned  to  the  rating  dimensions 
used  we  can  note  the  degree  of  similarity  between  the  clusters  obtained 
using  job  content  dimensions  vs  job  requirement  dimensions  vs  using  both 
sets  of  dimensions.  The  question  as  to  whether  there  would  be  greater 
validity  generalization  within  MOS  clusters  formed  through  job  content  or 


requirement  dimensions  or  through  some  combination  of  both  types  of  dimen¬ 
sions  can  thereby  be  examined  empirically. 

FY81/82  File  Data  Simulation 

To  a  certain  degree,  the  problem  can  be  simulated  on  the  FY81/82  file 
data.  For  that  data  base  ASVA6  scores  and  personal  history  data  are  avail¬ 
able  as  predictors;  and  training  school  grades,  SQT  scores,  and  EER  ratings 
(although  flawed)  are  available  as  criteria  on  a  much  larger  number  of  MOS 
than  19.  Thus  the  prediction  equations  developed  on  19  focal  MOS  can  be 
applied  to  each  of  the  other  MOS  to  determine  how  much  Information  would  be 
lost  _1f_  validity  generalization  were  used  rather  than  an  empirically  devel¬ 
oped  prediction  equation  for  the  MOS.  While  such  a  simulation  cannot  In¬ 
clude  psychomotor  and  other  noncognltlve  predictors  because  they  are  not 
part  of  the  file  data.  It  will  portray  the  validity  generalization  to  be 
expected  for  predictors  like  those  already  In  use. 

Simulations  Within  the  Focal  MOS 

Project  A  will  collect  complete  data  for  19  MOS.  These  19  can  In  turn  be 
used  to  simulate  a  population  of  MOS.  They  then  can  be  clustered  Into  hom¬ 
ogeneous  subgroups,  various  methods  can  be  used  to  estimate  empirically  a 
prediction  equation  for  one  or  more  of  the  MOS  in  a  subgroup,  and  the 
Information  loss  Incurred  by  using  that  equation  for  the  remaining  MOS  In 
the  subgroup  can  be  calculated. 


The  effect  of  subgroup  heterogeneity  on  the  amount  of  Information  loss  can 
also  be  explored  by  systematically  Increasing  the  size  of  the  subgroups 
(and  the  number  of  MOS  whose  data  is  pooled)  and  noting  the  Information 
loss  when  the  resultant  equation  Is  applied  to  the  MOS  Included  as  well  as 
excluded  from  the  data  pool. 

Similarity  Scaling 

Once  the  tests  that  will  be  used  In  the  FY86/87  validation  sample  are  iden¬ 
tified  It  will  be  possible  to  carry  out  another  kind  of  scaling  Investiga¬ 
tion  that  will  address  the  question  of  similarity  In  prediction  equations 
across  MOS.  Personnel  psychologists  who  understand  the  ability  domains 
must  be  used  as  judges  and  considerable  care  must  be  taken  to  develop 
thorough  descriptions  of  each  MOS  In  the  array  to  be  scaled. 

If  the  19  MOS  for  which  we  will  have  extensive  predictor  and  criterion  data 
are  considered  the  “focal"  MOS  then  the  relative  similarity  of  each  focal 
MOS  to  every  other  MOS  can  be  scaled  using  psychologists  as  judges.  The 
MOS  should  be  representative  of  the  cluster  structure  previously  Identi¬ 
fied.  The  judges  would  estimate  similarity  on  the  basis  of  the  relative 
amount  of  each  major  ability  factor  (as  determined  by  analysis  of  the  ex¬ 
perimental  predictor  battery)  required  by  a  particular  MOS  In  comparison  to 
each  of  the  19  focal  MOS.  Thus  there  would  be  a  similarity  profile  (across 
the  major  ability  factors)  that  could  be  used  to  predict  the  level  of  val¬ 


idity  and  the  pattern  of  predictor  weights  for  each  MOS  not  In  the 


research. 


The  design  permits  a  number  of  Internal  validity  checks  for  the  scaling 
procedure  and  can  even  be  "validated"  for  each  focal  MOS  by  comparing  the 
predicted  results  to  the  actual  results  for  the  other  18  MOS  for  which  we 
will  have  data.  Also,  for  each  MOS  not  In  the  research  there  will  bt  19 
estimates  of  what  the  validity  of  the  predictor  battery  should  be.  If  the 
scaling  were  perfect  the  estimates  would  converge  on  the  same  number. 
Obviously,  the  results  will  not  be  that  precise,  but  to  the  extent  the  19 
estimates  converge  we  can  be  more  confident  that  the  estimated  prediction 
equation  for  an  MOS  not  among  the  focal  19  will  be  a  reasonable  one. 

To  carry  out  this  research  30  judges  will  be  required  and  the  scaling  task 
will  be  extensive  In  terms  of  time  and  effort.  Pilot  work  using  5-10 
judges  will  be  carried  out  to  determine  the  most  feasible  way  to  describe 
MOS  and  to  conduct  the  scaling  sessions  so  as  to  minimize  the  time  burden. 

Oata  from  the  previously  described  clustering  research  can  also  be 
portrayed  In  the  above  fashion.  That  Is,  for  each  focal  MOS  the 
correlation  of  the  profile  of  task  dimension  ratings  for  the  focal  MOS  with 
each  other  MOS  can  be  examined.  Again,  for  each  focal  MOS,  18  of  these 
correlations  can  be  compared  against  the  actual  result.  An  Important 
research  question  Is  whether  the  relative  similarities  portrayed  by  scaling 
ability  requirements  are  comparable  to  those  obtained  by  scaling  task 
dimensions.  Also  of  particular  Interest  Is  whether  MOS  drawn  from  the  same 
clusters  tend  to  have  estimated  prediction  equations  that  are  more  similar 
to  one  another  than  equations  from  MOS  drawn  from  different  clusters. 


While  most  likely  none  of  these  Individual  methods  will  provide  a  defini¬ 
tive  estimate  of  the  prediction  equation  for  an  MOS  not  In  the  research 
sample,  when  taken  together  they  should  provide  a  reasonable  and  appropri¬ 
ate  estimate.  In  the  process,  a  number  of  basic  research  questions  about 
validity  generalization  will  have  been  addressed  and  we  will  be  much  better 
prepared  to  consider  future  questions  of  validity  generalization  as  the  MOS 
structure  changes. 


GENERAL  OUTCOMES 


The  Individual  task  plans  speak  to  the  specific  operational  and  scientific 
outcomes  that  will  be  produced  by  project  A.  These  reflect  a  number  of 
basic  themes  that  should  be  kept  In  mind. 

1)  Project  A  will  generate  a  broader  and  more  complete  sam¬ 
ple  of  the  predictor  space  than  has  ever  been  used  in  a 
selection  Investigation  before.  The  taxonomy  of 
predictors  that  is  established  will  stand  as  a  reference 
point  for  many  years  to  come. 

2)  Project  A  will  provide  the  most  thorough  attempt  ever 
made  to  develop  standardized  tests  of  actual  task  per¬ 
formance  In  skilled  jobs.  The  procedure  used  will  stand 
as  a  model  to  copy. 

3)  Project  A  will  be  by  far  the  most  thorough  test  to  date 
of  whether  success  In  training  predicts  success  on  the 
job. 

4)  Project  A  will  provide  a  state-of-the-art  model  for  how 
construct  validity  can  be  used  to  study  applied  problems 
In  selection  and  performance  assessment.  It  Is  our  be¬ 
lief  that  the  validation  strategy  used  here  anticipates 
how  the  validity  concept  will  be  reformulated  In  the 
forthcoming  revision  of  the  Joint  Standard  for  the  Use 


> 

$ 

I 

i 

.  ^ 

9 


I 


I 


4 


i 


4 


5)  Project  A  will  be  the  first  large  selection  and  classi¬ 
fication  research  effort  to  Incorporate  utility  In  the 
development  of  operational  decision  rules.  It  Is  a 
procedure  that  will  most  likely  be  copied  many  times. 

6)  Given  the  broad  range  of  predictors,  criteria,  and  jobs, 
project  A  will  be  the  most  comprehensive  test  ever  con¬ 
ducted  on  questions  of  differential  predictability 
across  gender  and  racial  groups. 

7)  State-of-the-art  answers  will  be  produced  about  the 
extent  of  validity  general Izablllty  across  jobs, 
criterion  measures,  and  predictor  constructs. 

The  overall  conclusion  to  be  drawn  from  the  above  Is  that  although  Project 
A  will  be  time  consuming  and  relatively  expensive,  the  scientific  and  prac¬ 
tical  payoffs  will  exceed  the  costs  many  times  over. 


34 


TASK  1 


VALIDATION  OF  RELATIONSHIPS  AMONG  PREDICTORS 
AND  PERFORMANCE  MEASURES 


GENERAL  PURPOSE 

Within  the  context  of  Project  A,  the  primary  responsibility  of  Task  1  is  to 
perform  validations  of  the  classification  measures.  To  ensure  successful 
validations,  the  staff  of  Task  1  also  must  work  closely  with  the  staff  of 
other  tasks,  particularly  in  performing  the  statistical  and  psychometric 
analyses  that  support  the  identification  and  development  of  new  measures. 
In  addition,  while  the  Computerized  Personnel  Allocation  System  (CPAS)  is 
still  under  development,  the  ASVA3  composites  that  are  the  primary  basis  for 
the  current  selection  and  classification  procedure  have  to  be  updated 
periodically  so  tnat  tney  are  maximally  effective  for  the  present  use. 
finally,  for  the  Army's  resources  to  be  efficiently  used  in  the 
implementation  of  the  CPAS,  the  cost-oenefit  of  alternative  selection  and 
classification  procedures  must  be  evaluated  in  terms  of  the  utility  of  their 
outcomes.  Task  1  will  carry  out  such  evaluation  in  coordination  with 
Project  d. 

In  summary,  the  purposes  of  Task  1  within  Project  A  are: 

(1)  to  recommend  revisions  of  Armed  Services  Vocational 
Aptitude  dattery  (ASVAd)  composites  (for  versions  8/9/10 
and  later  for  versions  11/12/13)  as  required  in  the 
current  selection  and  classification  procedures; 


(2)  to  validate  the  project-developed  classification 

instruments,  and  to  develop  accurate  prediction  models 
of  the  future  job  performance  of  enlistees; 

(3)  to  generate  appropriate  inputs  to  Project  3  as  required 
for  the  implementation  of  the  CPAS; 

(4)  to  evaluate  the  cost-benefit  of  alternative  classifi¬ 
cation  procedures;  and 

(5)  to  provide  technical  support  to  the  staff  of  other  tasks 
so  as  to  insure  the  psychometric  qualities  of 
project-developed  measures  and  the  data  adequacy  for  the 
validation  of  the  classification  instruments. 


It  should  also  be  noted  that  another  important  purpose  of  Task  1  is  to 
create  and  maintain  a  longitudinal  research  data  base  to  meet  the  needs  of 
the  present  project  as  well  as  other  ARI  projects  (e.g..  Project  B).  A 
comprehensive  longitudinal  research  data  base  plan  has  been  prepared 
separately  so  that  the  content  of  the  present  document  is  limited  to 
analyses  that  will  be  executed  using  that  data  base. 


1-2 


BACKGROUND  ISSUES  AND  RATIONALE 


Each  year  more  than  one  hundred  thousand  new  recruits  are  selected, 

classified,  trained,  and  assigned  to  perform  tne  nundreds  of  joos  required 
for  an  effective  Army.  The  system  presently  employed  by  the  Army  for  making 
the  initial  selection  and  classification  decision  has  a  long  history.  The 
development  of  the  primary  measure  currently  used  in  the  system— the  ASVA8 
8/9/10— can  De  traced  back  through  earlier  forms  to  the  AC3-73,  the  AQ3,  tne 
AFQT  and  AC3,  the  AGCT,  and  the  original  Army  Alpha. 

In  order  for  an  applicant  to  be  qualified  for  initial  enlistment  into  the 
Army  by  the  present  selection  and  classification  system,  he/she  must  meet  a 
number  of  eligibility  criteria.  Including  age,  moral  standards,  physical 

standards,  and  "trainabi 1 ity."  The  latter  determination,  the  most  relevant 
in  the  current  context,  is  based  upon  a  combination  of  two  sets  of  criteria: 
scores  attained  on  the  Armed  Services  Vocational  Aptitude  Sattey  (ASVA3) , 
and  educational  attainment.  The  ASVA3  is  currently  administered  as  an  entry 
test  at  Military  entrance  Processing  Stations  (MEPS;  formerly  called  AFEES), 
or  at  Mobile  Examining  Teams  (MET)  sites.  It  is  also  administered  by  MET  to 
hign  school  juniors  and  seniors;  these  scores  are  used  for  guidance 

counseling,  and  are  also  provided  to  Army  recruiters  as  a  means  of 

identifying  mentally-qualified  recruitment  prospects.  In  addition  to  ASVA8, 


non-high  school  graduates  are  administered  a  short  biographical 
questionnaire,  the  Military  Applicant  Profile  (MAP),  wnich  has  been  found  to 
be  a  useful  tool  for  identifying  individuals  who  are  likely  to  be  poor  risks 
in  terms  of  probability  of  completing  Army  initial  entry  training. 

for  applicants  who  have  not  previously  taken  the  ASVAB  and  whose 
educational/mental  qualifications  appear  to  be  marginal  based  on  the  Army's 
trainability  standards,  a  snort  Enlistment  Screening  Test  may  be 
administered  to  assess  the  prospects  of  passing  the  ASVAB  test.  Applicants 
who  appear,  upon  initial  recruiter  screening,  to  have  a  reasonable  prospect 
of  qualifying  for  service  are  referred  either  to  a  MET  site  for 
administration  of  the  ASVAB,  or  directly  to  a  MEPS.  MEPS  staff  complete  all 
aspects  of  the  screening  process,  including  administration  of  the  mental  and 
pnysical  examination.  Based  on  the  information  assembled,  classification 
and  assignment  to  a  particular  training  activity  are  made  for  those  found 
qualified  for  enlistment. 

About  80  percent  of  Army  enlistees  enter  the  Army  under  a  specific 
enlistment  option  that  guarantees  choice  of  initial  school  training,  career 
field  assignment,  unit  assignment,  or  geographical  area.  for  these 
applicants,  the  initial  classification  and  training  assignment  decision  must 
be  made  prior  to  entry  into  service.  This  is  accomplished  at  the  MEPS  by 
referring  applicants  who  have  passed  the  basic  screening  criteria  (mental, 
physical,  moral)  to  an  Army  guidance  counselor,  whose  responsibility  is  to 


matcn  the  applicant's  qualifications  and  preferences  to  the  Army's  current 
skill  training  requirements,  and  to  make  "reservations"  for  training 
assignments,  consistent  with  the  applicant's  enlistment  option. 


Tne  classification  and  training  "reservation"  procedure  is  accomplished  by 
tne  Recruit  Quota  System  (REQUEST),  which  was  implemented  in  1973.  REQUEST 
is  a  computer-oased  system  to  coordinate  the  information  needed  to  reserve 
training  slots  for  volunteers.  One  major  limitation  is  that  REQUEST  uses 
simple,  minimum  qualifications  for  accessions  control.  Thus,  to  the  extent 
tnat  an  applicant  may  minimally  qualify  for  a  wide  range  of  courses  or 
specialties,  cased  on  aptitude  test  scores,  the  initial  classification 
decision  is  governed  by  (a)  ni s/her  own  stated  preference  (often  based  upon 
limited  knowledge  aDout  the  actual  joo  content  and  working  conditions  of  the 
various  military  occupations),  ( 0)  the  availability  of  training  slots,  and 
(c)  priorities/needs  of  the  Army.  Numerous  procedures  for  Improving  the 
system  are  under  development.  These  Include  "MQS  Match  Module"  and  the 
previously  mentioned  Project  6  Computerized  Personnel  Allocation  System,  as 
well  as  other  smaller  efforts. 

This  review  of  the  current  practice  suggests  that  the  present  selection  and 
classification  procedures  could  De  Improved  by  taking  advantage  of  recent 
technological  advances  and  developments  in  decision  theory.  There  is  a  need 
for  developing  a  formal  decision-making  procedure  that  is  aimed  at 
maximizing  the  overall  utility  of  the  classification  outcomes  to  the  Army. 


1-5 


jggggyi 


However,  this  decision  process  must  allow  for  the  potentially  adverse 
impacts  on  recruitment  if  the  enlistee's  interests,  work  values  and 
preferences  are  not  given  sufficient  consideration.  There  are  clear 
trade-offs  that  must  oe  evaluated  between  the  procedures  necessary  (a)  to 
attract  qualified  people,  and  (b)  to  put  them  into  the  right  slots. 

from  tnis  perspective,  a  classification  system  must  be  Duilt  upon  a  thorough 
understanding  of  what  constitutes  effective  performance  in  the  Army.  In 
addition,  a  oasis  for  estimating  (predicting)  an  enlistee's  future 
performance  from  pre-induction  Information  needs  to  be  established.  In 
order  to  design  a  formal  classification  procedure  to  improve  personnel 
utilization  In  the  Army,  we  need  to  have  a  predictor  battery  that  is 
maximally  valid  and  can  De  administered  efficiently  at  tne  MEPS.  While  the 
Army  has  a  long  success  In  making  use  of  selection  and  classification  tests, 
the  prediction  system  could  most  probably  be  further  improved,  particularly 
by  adding  other  noncognitive  tests  (e.g.,  psychomotor  skills  and  vocational 
Interests).  The  Improved  and/or  newly  developed  predictors  need  to  be 
validated  in  terms  of  the  incremental  utility  they  will  contribute  In 
addition  to  the  existing  predictors.  Task  1  is  devoted  to  performing  sucn 
validations,  and  at  the  same  time  empirically  developing  accurate  prediction 
models  to  be  employed  in  the  Computerized  Personnel  Allocation  Systems 
(CPAS). 


1-6 


To  furtner  improve  the  effp-.  .Vr*".2:  :  of  personnel  utilization  in  the  Army, 
the  personnel  allocation  consider  incorporation  of  information 
gatnered  during  training  J  -ne  soldier's  earlier  career  into  the 
prediction  of  nis/her  subsequent  performance.  On  the  basis  of  .this  enhanced 
prediction  model,  a  sequential/dynamic  decision  process  can  be  established 
to  systematically  update  the  assignment  of  enlisted  personnel  to  jobs  that 
will  benefit  most  from  his/her  current  skills  and  qualifications.  In 
support  of  oulldlng  such  a  decision  model,  Task  1  Is  also  aimed  at 
validating  the  additional  post-enlistment  predictors  (i.e.,  school /training 
predictors  and  In-service  predictors)  against  Army-wide  as  well  as 
job-specific  performance  criteria. 


1-7 


SPECIFIC  OBJECTIVES 


Project  A  Is  designed  to  provide  an  empirical  basis  for  optimal  selection, 
classification,  and  utilization  of  Army  enlisted  personnel.  Optimization 
will  be  achieved  by  allocating  personnel  in  accordance  with  predictions  of 
their  performance  in  different  assignments.  Within  Project  A,  Task  1  is 
designed  to  evaluate  both  existing  predictors  and  predictors  developed  by 
other  Tasks  in  Project  A  In  terms  of  the  extent  to  which  they  meet  the 
goal  of  the  project. 

The  objectives  of  Task  1  fall  into  three  categories: 

(1)  Evaluation  of  Existing  Predictors; 

(2)  Support  for  the  Development  of  New/Improved  Measures;  and 

(3)  Evaluation  of  New/ Improved  Predictors. 

In  the  course  of  meeting  these  objectives.  Task  1  will  be  responsible  for 
the  development  and  maintenance  of  a  Longitudinal  Research  Data  Base  (IRDB), 
as  described  by  Wise  and  Wang  ( 1983) . 

Each  of  the  three  major  categories  of  objectives  can  be  divided  into 
specific  objectives  that  will  be  carried  out  over  the  period  of  the  project. 
Table  1-1  presents  the  projected  dates  for  accomplishing  each  of  the 
specific  objectives: 


1-3 


Table  1-1.  Projected  Dates  for  Acconpl 1 shl ng  the  Specific  Objectives  of  Task  1. 


Ob Jeccl ves 

Projected  Oates  of 
Accompl Ishment 

1.  Evaluation  of  Existing  Predictors 

1.1.  (Development  of  Area  Composites  for 

ASVAB-3/9/10 

March  1984 

1.2.  Production  of  Early  Reports  on  vali¬ 
dation  Issues  (validity  generalization, 
cultural  fairness,  and  cross-validation) 

• 

November  198’3  to  April 
(separate  reports) 

1985 

1.3.  Comparison  of  Computer  Administered 
and  paper-and-pencl 1  version  o'  ASVAB 

Oecember  1986 

1.4.  Development  (refinement)  of  ASVAB  area 
composites  for  forms  11/12/13 

May  1987 

2.  Support  for  Development  of  New/Improved  Measures 

2.1.  Identification  of  predictor  dimension; 
requiring  Improvement 

August  1983 
(input  to  Task  2) 

2.2.  Evaluation  of  existing  performance 
measures  (training  and  Army-wide) 

Oecember  1985 

2.3.  Assistance  In  design  and  analysis,  and 
review  of  reports 

throughout  the  period  of  the 
project,  as  requested  (see 
Research  plans  for  Tasks  2 
through  S  for  schedules) 

2.4.  Validation  of  proposed  new  and  Improved 
predictors  (using  FY83/84  cohort  data) 

April  1987 

2.S.  Support  for  the  development  of  utility 
measures  of  performance 

September  1986 

3.  Evaluation  of  New/Improved  Predictors 

3.1.  Generation  of  input  to  CPAS 

January  1988  to  August 
(In  stages) 

1989 

3.2.  Cost-benefit  comparisons  of  alternative 
classification  procedures 

September  1989 

3.3.  Production  of  follow-up  reports  on  vali¬ 
dation  Issues  (validity  general i zatlon , 
cultural  fairness,  cross-validation, 
stability  of  relationships) 

September  1989 

3.4.  Development  of  procedures  for  updating 

CPAS  parameter  estimates 

September  1989 

1-9 


Evaluation  of  Existing  Predictors  will  focus  on: 


(1.1)  development  of  and  assessment  of  the  discriminant 

validity  of  new  area  composite  scores  for  ASVAB 

forms  8,  9,  and  10; 

(1.2)  production  of  reports  on  the  validity  of  assumptions 

Involved  in  use  of  the  ASVAB  for  personnel  classifi¬ 
cation,  Including  validity  generalization,  cultural 
fairness,  and  cross-validation  of  predictive 

relationships; 

(1.3)  development  of  and  assessment  of  the  discriminant 

validity  of  new  Area  Composite  Scores  for  ASVAB 

forms  11,  12,  and  13. 


The  Initial  work  In  this  phase  of  the  project  will  make  use  of  data  on  the 
FY81/82  cohorts  of  Army  enlisted  personnel,  including  a  special  training 
data  file  developed  by  the  Army  Research  Institute  for  use  by  this  project. 
Subsequent  analyses  will  make  use  of  data  from  the  PY83/84  cohorts.  A  major 
step  will  be  taken  between  the  first  and  second  area  composite  score 
development  efforts  to  include  differential  utility  of  performance  as  the 
criterion  for  optimality. 


Support  for  the  Development  of  New/ Improved  Measures  will  focus  on: 


(2.1)  identification  of  areas  In  which  improvements  In 
existing  predictors  for  classification  decisions  are 
most  needed; 

(2.2)  evaluation  of  current  training  outcome  measures  and 
general  performance  Indicators  (e.g.t  E£R, 
discipline  actions)  as  additional  predictors  of 
suosequent  perf ormances; 


(2.3)  response  to  requests  for  sampling,  design,  and 
analytical  assistance  and  review  from  other  Task 
Leaders; 

(2.4)  validation  of  proposed  new  and  improved  predictors, 
employing  data  on  FY83/84  cohort;  and 

(2.5)  analyses  in  support  of  the  development  of  measures 
(In  Task  4)  of  the  utility  to  the  Amy  of  several 
performance  levels  in  different  MOS. 


The  first  two  of  these  objectives  will  aim  to  make  best  use  of  existing  data 
bases  to  provide  information  to  the  project  staff  charged  with  the 
development  of  new  and  improved  measures.  In  providing  assistance  to  other 
Task  Leaders,  Task  1  will  endeavor  to  coordinate  the  data  base  and  analytic 
activities  of  the  other  Tasks  so  as  to  minimize  the  overlap  of  efforts  while 
maximizing  the  exchange  of  empirical  results.  The  efforts  of  Task  1  with 
respect  to  FY83/84  data  will  mainly  involve  analyses  that  cut  across  the 
other  tasks. 


The  Evaluation  of  New/Improved  Predictors  will  focus  on: 


(3.1)  generation  of  inputs  to  the  Computerized  Personnel 
Allocation  System  being  developed  In  Project  8; 

(3.2)  cost-benefit  comparisons  of  alternative  measurement 
and  assignment  strategies  for  selection,  classifi¬ 
cation,  and  utilization  of  Army  enlisted  personnel  at 
various  points  in  their  careers; 

(3.3)  production  of  reports  on  the  validity  of  assumptions 
involved  in  use  of  new  and  improved  predictors  for 
personnel  classification,  including  staDility  of 
relationships  across  time,  validity  generalization, 
cultural  fairness,  and  cross-validation  of  predictive 
relationships;  and 


(3.4)  development  and  validation  of  procedures  for  updating 
CPAS  parameter  estimates  on  the  basis  of  alternative 
data  collection  strategies. 

The  work  In  this  final  phase  of  the  project  will  be  based  primarily  on  the 
longitudinal  data  collection  on  the  F Y86/87  cohorts.  All  new  and  improved 
predictors  and  criteria  developed  in  this  project  will  be  administered  to 
these  soldiers,  so  that  longitudinal  predictive  validation  can  be  performed. 
The  evaluations  will  be  coordinated  with  Project  3  so  that  increments  in 
validity  can  be  evaluated  in  the  context  of  actual  supply  and  demand 
constraints  on  Army  personnel  assignments.  The  last  objective  (number  3.4) 
will  provide  tne  flexioility  needed  for  continuous  operation  of  tne  CPAS  in 
the  face  of  changing  supplies  of  and  demands  for  personnel  with  particular 
knowledge,  skills,  and  abilities. 


OVERALL  SUMMARY  OF  THE  PROCEDURE 


The  work  of  Task  1  has  been  divided  Into  eight  functional  Subtasks.  The 
numbering  of  these  Subtasks  has  been  designed  to  facilitate  correspondence 
with  the  other  Tasks.  In  addition  to  the  analytical  subtasxs,  the 
Longitudinal  Research  (Data  8ase  (LAOB)  has  been  included  as  Subtask  1.1,  and 
Management  and  Coordination  have  been  included  as  Subtask  1.6.  The  Subtasks 
are  listed  below. 


Subtask  1.1:  LrtOB  Development  and  Maintenance; 

SubtasK  1.2:  Support  for  the  Development  of  New/ Improved 

Pre-Induction  Measures  (Task  2); 


Subtask  1.3: 


Subtask  1.4: 


Subtask  1.5: 


Subtask  1.6: 


Support  for  the  Development  of  New/ Improved 
Training  Outcome  Measures  (Task  3); 

Support  for  the  Development  of  New/ Improved 
Army-wide  Criteria  (Task  4); 

Support  for  the  Development  of  MOS-speciflc 
Criteria  (Task  5); 

Management  and  Coordination  with  Other  Tasks 
and  with  Project  8; 


Subtask  1.7:  Validation  of  Existing  Predictors;  and 

Subtask  1,8:  Validation  of  New/Improved  Predictors. 


Time  lines  for  these  Subtasks  and  their  Interfaces  with  other  Tasks  and 
Subtasxs  are  described  in  the  Integrated  Master  Plan  for  Project  A.  for  tne 


1-13 


agaa-rwggsyraiET- 


purposes  of  this  Research  Plan,  we  have  organized  the  Subtasks  in  terms  of 
the  specific  objectives  described  in  the  preceding  Section.  Thus,  we  first 
discuss  our  plans  for  the  Evaluation  of  Existing  Predictors,  which  will  be 
carried  out  as  Subtask  1.7.  Second,  we  discuss  our  plans  for  Support  for 
the  Development  of  New/Improved  Predictors  and  Criteria,  which  will  be 
carried  out  as  Subtasks  1.2  through  1.5.  Finally,  we  discuss  the 
longitudinal  validation  of  tne  comoined  new  and  improved  system  for 
personnel  selection,  classification,  and  utilization,  which  will  be  carried 
out  as  Subtask  1.3. 

We  nave  devoted  most  attention  to  the  Evaluation  of  Existing  Predictors  and 
to  the  Evaluation  of  New/Improved  Predictors,  because  the  research  plans  of 
tne  other  Tasks  describe  the  needed  analyses  for  the  Development  of 
New/Improved  Measures  in  detail.  To  reiterate  them  in  Task  1  would  only 
create  redundancy  in  the  content  of  the  Project  A  Research  Plan. 


PROCEDURE 


Task  1  In  Project  A  plays  a  dual  role,  (a)  carrying  out  validation  analyses 
to  provide  the  foundation  for  the  CPAS,  and  (b)  supporting  the  research 
efforts  of  the  rest  of  the  project.  In  the  former  category  fall  the 
validation  of  current  measures,  including  the  development  of  ASVA8  area 
composite  scores  based  on  the  FY81/S2  cohort  data,  and  the  validation  of  the 
new  and  improved  battery  that  will  be  developed  over  the  course  of  this 
project.  In  the  latter  category  fall  the  development  of  the  Longitudinal 
Research  Oata  Base  and  the  various  analyses  needed  to  support  the 
development  of  new  MEPS-level  predictors,  training  outcome  measures. 
Army-wide  criteria,  and  MOS-specIflc  performance  measures. 

Given  this  multiple  role,  we  have  divided  our  research  plan  into 
three  sections: 

Section  1.  Validating  existing  predictors  for  use  in  selection  and 
classification  (Subtask  1.7  in  the  Integrated  Master  Plan); 

Section  2.  Supporting  the  development  of  new/improved  measures 

(Subtasks  1.2  through  1.5  In  the  Integrated  Master  Plan);  and 

Section  3.  Validating  new  and  improved  predictors  for  use  in  the 
CPAS  (Subtask  1.8  In  the  Integrated  Master  Plan). 

Activities  in  Subtasks  1.2  through  1.5  will  proceed  continuously  throughout 
the  project,  while  those  in  Subtask  1.7  will  be  replaced  by  Subtask  1.3  wnen 


1-15 


we  turn  to  rV86/87  cohort  analyses.  A  numoer  of  methodological  issues 
surrounding  the  validation  analyses  will  be  examined  concurrently  with  these 
activities.  As  noted  earlier.  Subtask  1.1  is  discussed  in  a  separate 
document--the  Longitudinal  Research  Database  Plan. 

Sacttan  V  Vacation  of  Extattra  Predictors  for  Uae  In  Army  Enlhrtgd  Personnel  Selection  and 
OeeaHteation  Procedures 

The  development  of  new  and  Improved  instruments  for  prediction  of  per¬ 
formance  must  oe  based  on  a  thorough  evaluation  of  the  current  procedures. 
A  major  effort  assigned  to  Task  1  Is  to  perform  analyses  of  existing  data  to 
determine  the  validity  of  the  existing  ASVAB  battery,  supplemented  by 
currently  available  background  data.  This  subtask  is  furtner  defined  as  the 
development  of  area  composite  scores  based  on  the  current  ASVAB;  that  is, 
the  identification  of  those  scoring  procedures  that  make  best  use  of  the 
ASVAB  for  Army  personnel  selection  and  classification  decisions,  in  the 

context  of  their  present  use. 

Development  of  ASVA8  Area  Composite  Scores 

The  initial  major  analytical  effort  in  this  project  will,  to  a  great  extent, 
aim  to  identify  the  best  area  composite  scores  that  can  be  derived  from  the 
current  ASVAB.  The  results  may  either  corroborate  use  of  the  current 
composites  or  they  may  indicate  new  or  revised  ones.  This  effort  is  of 
suostantial  significance  to  the  Army,  and  is  a  major  product  of  the  first  18 
montns  of  the  project.  It  is  also  a  "rehearsal"  for  the  subsequent 


1-16 


improvements  to  the  Army  selection  and  classification  system  to  be  developed 
in  this  project.  To  achieve  the  oojectives  of  this  initial  effort,  several 
metnodological  issues  that  will  affect  the  ultimate  results  of  the  project 
will  be  dealt  with. 

The  initial  improvements  in  the  selection  and  classification  system  will  oe 
undertaken  entirely  within  the  framework  of^-ttre  existing  ASVA8  usage  and 
will  make  extensive  use  of  the  large  amount  of  prior  work  on  ASVA8  area 
composite  scores,  especially  the  validations  carried  out  by  the  Center  for 
Naval  Analysis  (Maier,  1981,  1982;  Sims,  1978;  Sims  &  Mifflin,  1978;  Sims  & 
Hiatt,  1981).  Nevertheless,  using  tne  special  data  base  aeveloped  for  the 
Army  on  the  F/81/82  cohort  by  Afll,  we  expect  to  be  aDle  to  provide 
significant  enhancements  to  the  current  state  of  knowledge  concerning  the 
proper  use  of  ASVA5  scores  for  training/job  assignments. 

The  current  version  of  ASVA8  (Form  8/9/10)  was  introduced  in  October,  1980. 
There  are  nine  composites  that  are  defined  largely  based  on  the  validation 
of  ASVAB  6/7.  Each  of  the  nine  composites  is  used  to  determine  the 
qualification  of  an  applicant  for  one  of  the  nine  specific  MQS  groups.  In 
addition,  tne  Army  continues  to  define  tne  AFQT  composite  for  use  in  the 
initial  screening  of  applicants.  Another  composite  (3  or  GT,  General 
Technical)  is  also  defined,  out  not  associated  with  specific  MQS  groups.  Of 
these  11  composites,  four  are  very  similar  to  the  composites  used  in  otner 
services;  they  are  generally  referred  to  as  MAGE  (M:  Mechnical  Maintenance; 


A:  Administrati ve/Clerical ;  G:  General  or  General  Technical;  and  £: 
Electronics  Repair) . 

The  initial  validation  of  the  ASVAS  8/9/10  composites  as  the  Army's 
election  and  classification  predictors  was  carried  out  by  Maler  in  1981 
(loc.  cit),  employing  final  course  grade  and  job  proficiency  tests  as 
criteria.  He  concluded  tnat  the  composites  defined  on  the  basis  of  ASVAB 
6/7  validations  are  valid  predictors  of  training  success  as  well  as  job 
proficiency.  However,  the  data  used  to  conduct  the  validation  were  based  on 
ASVAB  6/7  scores.  Later  in  1982,  Maier  (loc.  cit.)  validated  these  ASVAB 
composites  with  scores  on  forms  3/9/10  for  the  Marine  Corps,  using  final 
course  grades  as  the  criterion.  In  general,  his  results  confirm  the  pre¬ 
dictive  validity  of  the  composites.  Other  studies,  such  as  those  by  Sims 
and  his  colleagues  (loc.  cit.)  also  substantiate  the  validity  of  these  com¬ 
posites  albeit  all  oased  on  ASVAS  6/7  data. 

This  brief  review  of  the  past  validations  of  existing  ASVAS  8/9/10 
composites  reveals  that  a  complete  validation  of  the  current  ASVAB  tests  is 
still  to  be  carried  out.  In  response  to  this  need,  the  Army  Research 
Institute  has  collected  a  comprehensive  set  of  training  performance  data  on 
the  FY81/82  recruits  who  were  among  the  first  to  take  ASVAB  3/9/10  and 
attended  Army  schools  during  CY81.  The  training  graduates  of  this  cohort  of 
recruits  are  now  in  their  first-tour  and  many  have  taken  the  Skill 
Qualification  Test  (SQT).  Additionally  otner  general  performance  records 


1-18 


for  them  are  also  available  from  the  enlistment  Master  rile  ( EMF ) .  Thus  it 
is  now  possible  for  us  to  conduct  a  validation  of  ASYA3  3/9/10  tests  as 
predictors  in  the  selection  and  classification  of  the  Army's  recruits,  using 
not  only  training  performances  but  also  the  SQT  scores  and  general 
performance  indicators  (e.g.,  ££R ' s,  attrition,  disciplinary  actions)  as 
cri  teri  a. 

Our  approach  will  oe  essentially  empirical,  emphasizing  computation  of  area 
composites  that  have  been  found  to  be  indicative  of  successful  training 
outcomes  and  proficient  execution  of  tasks  in  the  field.  The  Initial  effort 
will  focus  on  the  predictive  validity  (absolute  as  well  as  differential)  of 
the  ASVA8  8/9/10  subtests  and  the  composites  currently  in  use.  On  the  basis 
of  these  evaluations,  we  will  then  determine  whether  the  effectiveness  of 
ASVA3  in  the  current  Army  selection  and  classification  practice  could  be 
improved  either  by  modifying  the  existing  composites  or  by  developing  new 
ones.  Idea’ly,  whether  to  contln*  jing  the  current  composites  or  to  adopt 
new  ones  should  be  assessed  by  increase  of  total  performance  (effective¬ 
ness)  In  the  Army  as  a  res.  of  basing  selection  and  classification 
decisions  on  the  revised  compo^  However,  the  validation  with  the 
FY81/82  cohort  data  will  not  be  conducted  strictly  in  this  context  because 
utility  measures  for  integrating  the  job  performances  in  the  Army  into  a 
single  effectiveness  scale  will  not  be  available  until  1985.  Therefore  our 
present  effort  will  essentially  follow  the  traditional  validation  approach. 
Hhen  the  utility  measures  are  fully  developed  Dy  this  project,  the  new  ASVAB 
composites  will  De  revalidated  more  formally. 


1-19 


The  FY81/32  cohort  validation  of  the  ASVAB  3/9/10  will  begin  in  May,  1983, 
and  conclude  in  March,  1984.  An  interim  report  will  be  submitted  to  ARI  for 
review  in  October,  1983.  Based  on  comments  from  ARI,  we  will  finalize  our 
recommendations  on  the  sat  of  composites  to  be  used  beginning  in  October, 
1984.  Because  the  ASVA8  11/12/13  forms  are  scheduled  for  administration  in 
October  this  year,  the  area  composites  will  in  effect  be  applied  to  the 

ASVAB  11/12/13  scores.  Clearly,  the  present  validation  of  ASVAB  8/9/10  will 
also  have  to  address  the  Issues  that  may  arise  in  defining  ASVAB  11/12/13 
composites.  The  subtests  of  forms  11/12/13  are  essentially  the  same  as 

forms  8/9/10,  we  therefore  do  not  anticipate  special  difficulties  in 

adopting  the  revised  composites  for  the  new  tests,  employing  data  on  forms 

11/12/13  to  be  collected  from  the  FY83/84  and  FYS6/37  cohort,  we  will 

continue  the  validation  of  ASVAB  in  order  to  assess  tne  validity  of  the 
composites  using  Improved  sets  of  criteria  and  to  revise  them  as  required. 

In  what  follows,  we  first  present  the  objectives  of  tnls  validation  effort, 
and  then  describe  the  procedures  that  we  will  follow  to  accomplish  the 

objectives. 

Objectives.  The  objective  of  this  subtask  is  to  identify  a  set  of  area 

composite  scores  (or  more  generally,  ASVAB  scoring  rules)  that: 

(1)  are  feasible  to  Implement; 

(2)  maximize  expected  performance,  wnen  properly 

implemented;  and 


1-20 


(3)  exhibit  appropriate  stability. 


tach  of  these  requirements  is  exceedingly  complex  and  involves  contro¬ 
versies  that  must  be  addressed.  We  plan  to  address  them  in  such  a  way  that 
the  results  of  this  effort  will  help  lay  the  groundwork  for  the  remainder  of 
the  project. 

The  question  of  feasibility  of  implementation  concerns  such  Issues  as  the 
type  of  coefficients  that  can  be  used  in  combining  subtest  scores,  the 
number  of  subtests  in  each  composite,  the  number  of  different  composites, 
the  possibility  of  adjustments  based  on  subsidiary  Information  (such  as  the 
"add  10  points  for  high  school  graduation"  rule  proposed  by  Sims  4  Hiatt, 
1981),  and  the  use  of  multiple  cutoffs  for  a  single  MOS.  These  Issues  will 
De  addressed  by  comparing  the  predictive  validities  of  alternative  sets  of 
composites  that  vary  in  these  respects. 

t 

Tne  current  ASVA3  Area  Composites  are  computed  as  sums  of  the  subtest 
standard  scores  (each  subtest  Is  scaled  to  have  mean  50  and  s.d.  10)  and 
then  converted  to  a  scale  that  is  comparable  across  the  composites  (with 
mean  100  and  s.d.  20).  As  a  starting  point,  we  will  define  composites  that 
employ  unit  weighting  and  Include  3  or  4  suotests  In  each  composite. 
Changes  in  these  traditional  practices  will  be  recommended  only  if  tney 
result  in  a  significant  Increase  in  validity. 


1-21 


Maximization  of  total  expected  performance  (or  allocation  average,  Srogden, 
1946)  requires  considerations  of  the  constraints  of  MOS  requirements  and 
supply  of  applicants,  as  well  as  a  common  utility  scale  for  performances  in 
different  MOS.  The  necessary  utility  data  will  not  be  available  for  this 
initial  effort,  so  we  must  rely  on  the  same  assumption  that  previous  efforts 
have  incorporated.  The  assumption  is  that  all  measured  Increments  of 
performance  (expressed  In  standard  scales  with  common  mean  and  s.d.  across 
MOS),  either  within  MOS  or  between  MOS,  are  equally  valuable.  The  question 
of  requirement  and  supply  will  be  examined  by  simulations,  l.e. ,  generating 
data  by  computer  according  to  the  anticipated  supply  of  recruits,  maxing  job 
assignments  to  match  the  quota  based  on  the  composites,  and  then  evaluating 
the  expected  performance  of  the  outcome.  These  simulations  will  be  coor¬ 
dinated  with  Project  S  staff. 

Finally,  the  question  of  stability  involves  an  appreciation  for  the  costs  of 
altering  enlistment  procedures,  as  well  as  the  statistical  sophistication  to 
differentiate  between  real  and  chance  variations.  In  order  to  differentiate 
between  real  and  chance  variations,  we  plan  to  conduct  careful,  and 
extensive,  cross-validations. 

Procedures.  Our  procedures  for  developing  the  area  composites  Include 
several  major  steps.  First,  preliminary  analyses  will  be  performed  to 
determine  the  availability  and  adequacy  of  data  to  support  the  validity 
analysis  (e.g.,  samole  sizes,  Kinds  and  characteristics  of  criterion 
measures,  and  score  scales).  Second,  we  will  address  the  methodological 


1-22 


issues  concerning  the  problem  of  selectivity  (restriction  of  range),  and  the 
0  possibility  of  nonlinear  relationships  between  predictors  and  criteria.  The 
results  of  these  investigations  will  be  used  to  make  appropriate  data 
adjustments  and/or  transformations  so  that  proper  models  can  be  applied  to 
*  conduct  the  validation.  In  addition,  the  issue  of  fairness  will  be  examined 
to  insure  that  the  area  composites  to  be  developed  are  valid  for  groups  of 
special  interest.  Third,  the  number  of  area  composites  required  to  reliably 
differentiate  the  performances  between  MOS  will  be  estimated  by  clustering 
the  MOS  into  homogeneous  groups  such  that  there  will  be  substantial  validity 
generalization  among  MOS  within  group,  but  differential  validity  across 
groups,  fourth,  validity  analysis  will  be  conducted  for  each  MOS  group  to 
define  the  best  area  composite  for  that  group,  and  to  explore  the  impact  of 
jg  different  cutoff  scores  for  selection  into  each  MOS.  Finally,  we  will  carry 
out  cross-validations  to  lessen  the  Impact  of  chance  variations  on  the  area 
v  composites.  We  now  turn  to  describe  the  details  for  each  of  these  steps. 

1C 

Step  1:  Conduct  preliminary  analyses. 

The  most  Important  question  we  must  address  before  the  validation  concerns 
the  availability  and  validity  of  criterion  measures.  The  salient  fact  is 
that  although  all,  or  nearly  all,  enlisted  personnel  take  the  ASVA3  under 
controlled  conditions  at  a  specified  time,  there  Is  no  similar  uniformity  of 
cri  teri  a. 


1-23 


That  this  suDtasx  can  even  oe  considered  at  this  time  is  only  possible 
because  of  the  major  effort  undertaken  by  ARI  to  develop  a  data  base  of 
training  outcomes  on  the  FY31/82  cohort.  Nevertheless,  those  training  out¬ 
comes  were  recorded  at  various  sites  around  the  country,  under  various 
conditions,  and  it  will  not  be  appropriate  to  treat  these  data  as  uniform 
without  further  documentation.  Work  already  undertaken  by  ARI  has  indi¬ 
cated  some  limitations  of  the  training  data. 

The  meaning  of  final,  course  grades  that  constitute  the  primary  criterion  in 

previous  validations  has  also  changed  as  a  result  of  recent  emphasis  on 

oojectlve-based  training  and  mastery  testing.  Based  on  what  we  have  learned 

from  the  training  performance  data  so  far,  we  anticipate  that  for  many  MOS, 

♦ 

the  criterion  scores  will  not  have  sufficient  variability  to  support 
meaningful  validity  analyses.  Moreover,  unless  we  rescale  the  grades  to 
make  them  comparable  across  courses/classes  within  an  MOS,  so  that  data  from 
different  classes  can  be  pooled  for  the  analysis,  we  may  not  have  sufficient 
sample  sizes  to  obtain  reliable  results. 

Therefore,  we  have  begun  to  examine  the  similarity  of  the  performance  scores 
between  classes  and  schools  using  the  information  provided  in  the  ARI 
documents  and  later  to  be  supplemented  with  information  that  Task  3  staff 
are  collecting  during  their  school  visits.  Once  it  can  be  determined  that 
the  course  contents  are  similar  and  the  tests  used  are  comparable,  we  will 
pool  tnose  data  for  actual  analysis.  We  will  also  perform  descriptive 
analyses  by  school  and  by  MOS,  when  the  data  editing  is  completed,  in  order 


j  ' 

to  assess  the  score  distribution  and  determine  tne  sample  size  avaliaole  for 
®  analysis.  The  sample  size  is  particularly  of  concern  wnen  we  want  to  do 
subgroup  analysis  in  the  examination  of  the  fairness  issue.  Additionally,  we 

will  standardize  the  course  grades  within  each  MOS  so  that  the  expected 

m 

performances  will  not  reflect  potentially  large  between  MOS  differences.  In 
this  regard  the  ASVA8  may  be  used  as  a  common  referent  to  "equate"  course 
grade  or  other  criterion  distributions  across  MOS. 

nt  intend  to  supplement  training  outcome  data  with  later  performance 
n  measures,  such  as  EER  and  SQT  scores.  8ecause  the  FY81/82  cohort  will  have 
been  in  the  service  for  approximately  2.5  years  at  the  time  we  carry  out 
these  analyses,  it  may  be  possible  to  compare  the  area  composites  based  on 
£  training  outcomes  to  area  composites  based  on  field  performance  measures. 
For  the  SQT  scores,  we  will  perform  preliminary  analyses  to  assess  the 
effects  on  soldiers'  performance  of  time  intervals  between  completion  of 
C  training  and  administration  of  the  test.  The  results  will  Inform  us  whether 
we  should  take  into  account  this  time  factor  in  validating  the  ASVA8  tests 
with  SQT  scores  as  the  criterion. 

Step  2:  Address  methodological  issues  and  make  data  adjustments. 


(1)  Problem  of  selectivity  bias  (restriction  of  range). 


The  most  serious  issue  that  confronts  us  in  conducting  the  vali¬ 
dation  in  the  existing  Army  setting  is  the  problem  of  selectivity, 
decause  ASVAS  composites  were  used  to  select  and  assign  the 
recruits,  we  expect  restriction  of  ranges  in  the  subtest  scores  as 


1-25 


rs-Titrr 


a  result  of  such  implicit  selection  (note  that  this  selectivity 
problem  is  further  complicated  Dy  the  fact  that  the  recruits  can 
cnoose  among  MOS  for  wnicn  they  are  qualified  and  there  are 
guarantee  and  bonus  options).  This  selectivity  problem  can 
distort  tne  relationship  between  the  predictors  and  the  criterion 
if  the  selection  variables  are  not  all  included  in  the  prediction 
equation,  and  can  even  introduce  nonlinearity  (Heckman,  1979). 

The  traditional  approach  to  alleviating  the  effects  of  range 
restriction  on  a  validity  investigation  is  to  make  corrections 
based  on  the  assumptions  of  linear  regression  and  homoscedas- 
ticity.  There  are  two  ways  to  make  the  corrections  —  univariate 
model  and  multivariate  model.  Sims  and  Hiatt  (1981)  conducted  a 
simulation  to  assess  these  correction  methods  and  found  that  the 
multivariate  model  is  more  effective  and  quite  satisfactory  in 
reducing  the  errors  of  correlation  estimates  based  on  restricted 
samples.  The  multivariate  correction  was  formulated  by  Lawley 
(  1943)  and  is  not  difficult  to  apply.  We  plan  to  carry  out 
adjustments  of  the  correlations  using  the  FY81/82  applicant 
population  as  the  reference  (base)  population. 

For  the  case  of  a  dichotomous  criterion,  a  classical  solution  for 
the  explicit  selection  case  is  available  and  is  due  to  Gillman  and 
Goode  (1946).  Regrettably,  there  does  not  seem  to^  be  any 
completely  satisfactory  solution  to  tne  dichotomous  criterion  for 
the  implicit  selection  case.  Because  this  is  the  case  in  which  the 
ASVA8  is  validated  using  a  dichotomous  criterion.  It  also  requires 
attention  when  we  use  a  general  performance  indicator  such  as 
attrition  as  tne  criterion  for  validation, 

Tne  existing  literature  on  correcting  for  restriction  of  range 
contains  solutions  for  the  relatively  simple  cases;  we  need  to 
investigate  the  applicability  of  these  solutions  for  the  present 
project.  In  addition,  we  will  investigate  the  degree  to  which  the 
statistical  assumptions  underlying  these  correction  procedures  are 
violated  and  develop  ways  of  coping  with  difficult  cases. 

There  have  been  a  few  Monte  Carlo  studies  on  the  effect  of  using 
tne  classical  methods  of  correcting  for  selection  when  the 
assumptions  are  violated  (NovIck  &  Tnayer,  1969;  Rydberg,  1963; 
Meredith,  1953;  Srinivasan  &  Weinstein,  1973;  Greener  &  Osburn, 
1980).  In  general.  It  appears  that  using  the  correction  formula 
is  better  tnan  working  with  the  uncorrected  correlation,  but  the 
performance  of  the  formula  worsens  as  the  degree  of  selection 
increases.  For  low  to  moderate  selection  (fewer  tnan  40  percent 
rejected)  the  corrected  correlation  is  considerably  oetter  than 
the  uncorrected  correlation.  However,  certain  patterns  of  hetero¬ 
geneous  error  variance  and  curvilinear  relationships  result  in 
unacceptable  overestimates  of  the  population  correlation  (Greener 


a 

'_s 

’  V 


l 


it  Osburn,  1980).  We  will  use  the  existing  data  on  the  FY81/32 
cohort  to  estimate  selection  ratios,  and  evaluate  the  assumptions 
of  homogeneity  of  error  variance  and  linear  relationships.  If  one 
or  both  of  these  assumptions  are  violated,  we  will  search  for 
linearizing  transformations. 

(2)  Assumption  of  linear  relationships. 


The  second  issue  we  must  deal  with  is  the  appropriateness  of 
linear  models  for  the  validation.  In  general  we  assume  that 
linear  models  will  be  appropriate  or  at  least  a  good  approximation 
of  other  models.  As  noted  earlier,  selectivity  problems  may 
introduce  artificial  nonlinearity.  It  Is  also  known  that 
measurement  errors  (unreliabilities)  of  the  regressor  variables 
frequently  distort  the  underlying  regression  by  introducing 
nonlinearity  into  the  model  (Cochran,  1970;  Lindley,  1947).  For 
these  reasons,  we  need  to  check  on  the  linearity  assumption.  When 
the  data  suggest  nonlinearity,  we  will  attempt  to  linearize  it  by 
transformations  or  by  incorporating  polynomial  terms  into  the 
model.  However,  it  should  be  noted  that  empirical  research 
frequently  finds  that  non-linear  relationsnips  are  often  quite 
unstable  and  cannot  be  replicated.  Further,  previous  research 
also  suggests  that  most  non-1 Inearltles  can  be  satisfactory 
approximated  by  polynomial  functions  that  In  effect  render  the 
model  additive  (linear). 


•« 


(3)  Moderator  effects  and  the  issue  of  cultural  fairness. 


The  third  Issue  that  is  of  great  policy  interest  is  the  question 
of  cultural  fairness  of  the  selection  and  classification 
procedures.  For  a  successful  validation,  the  predictor  and 
criterion  measures  should  be  reliable  and  free  from  socio-cultural 
bias  (Ideally,  from  bias  against  any  Individual). 

In  accordance  with  the  current  law,  fairness  or  unfairness  can  be 
defined  as  "when  members  of  one  race,  sex,  or  ethnic  group 
characteristically  obtain  lower  scores  on  a  selection  procedure 
than  members  of  another  group,  and  the  differences  in  scores  are 
not  reflected  in  differences  In  a  measure  of  job  performance,  use 
of  the  selection  procedure  may  unfairly  deny  opportunities  to 
members  of  the  group  that  obtains  the  lower  scores"  (Miner  & 
Miner,  1979).  This  Interpretation  is  easy  to  understand  because 
the  commonly  used  selection  procedures  often  emoloy  a  scoring 
formula  that  comoines  the  predictor  scores  into  a  single  score  on 
wnicn  the  selection  is  based.  In  the  development  of  ASVAB 


1-27 


composites,  it  is  important  that  use  of  these  composites  in 
selection  does  not  result  in  bias  against  any  particular  group. 

There  are  a  series  of  steps  we  can  take  in  the  present  project  in 
order  to  achieve  this  objective.  First,  we  need  to  identify  the 
factors  that  are  related  to  group  differences  (with  individual 
differences  as  the  limiting  case)  In  predictive  validities  and 
factors  that  may  affect  predictor  and  criterion  reliabilities.  We 
are  concerned  with  the  reliabilities  of  both  predictors  and 
criteria  even  tnough  criterion  reliabilities  are  traditionally 
considered  as  being  less  critical  in  validation  efforts.  This 
lack  of  emphasis  on  criterion  reliabilities  largely  stems  from  the 
belief  that  measurement  errors  of  the  criteria  do  not  affect  the 
accurate  estimation  of  tne  predictor-criterion  relationship  when 
Ordinary  Least  Squares  (OlS)  procedures  are  applied.  This  is  not 
entirely  true  because  such  errors  can  lead  to  underestimates  of 
predictive  validity.  More  important,  when  the  validation  involves 
separate  groups  for  whom  the  criterion  has  unequal  reliabilities 
(as  possibly  the  case  in  the  present  context),  apparent  group 
differences  in  validity  estimates  may  be  mistaken  as  true  group 
differences  in  the  strength  of  the  predictor-criterion 
relationships.  Moreover,  if  a  common  relationship  exists  among 
groups,  failure  to  take  into  account  the  unequal  reliabilities  of 
the  criterion  In  the  OLS  estimation  procedure  can  produce 
inconsistent  estimates  of  the  relationship.  Thus,  in  order  to 
identify  real  group  differences  in  the  validation  results,  we  have 
to  be  concerned  with  criterion  reliabilities  as  well  as  predictor 
reliabilities.  (It  Is  well  recognized  that  measurement  errors  of 
predictors  can  cause  underestimation  of  the  relationship.) 

The  concept  of  moderator  variables  is  central  to  the  technique 
employed  to  identify  factors  that  may  influence  the 
predictor-criterion  relationships.  A  variable  Is  said  to  be  a 
moderator  if  it  does  not  have  a  direct  relationship  with  the 
criterion  but  can  Influence  the  form  and  strength  of  the 
predictor-cri terion  relationship  (Shanna,  Durand,  &  Gur-Arie, 
1981).  A  classical  example  is  that  predictability  of  freshman 
college  grades  tends  to  be  nigher  for  women  than  for  men,  so  that 
sex  is  considered  as  a  moderator  (Saunders,  1956).  This  same 
concept  has  been  generalized  to  include  moderators  for 
rel i aoi lit i es  (Ghiselli,  1963;  Linn  &  Werts,  1971)  as  well.  This 
usage  is  a  direct  extension  of  the  preceding  definition  when  we 
consider  reliability  to  be  a  relationship  between  repeated 
measures  of  tne  same  attribute  ( vari able) --predicting  the  observed 
score  from  tne  true  score.  Furthermore,  joint  moderators  whicn 
interact  to  influence  the  predictor-criterion  relationships  may 
also  exist  (Zedeck,  Cranny,  Vale,  4  Smith,  1971). 


The  relevance  of  moderating  effects  in  the  context  of  fairness  in 
selection  can  De  seen  from  an  early  definition  of  test  bias:  a 
test  (predictor)  Is  regarded  as  oiased  when  the  regression  lines 
computed  separately  for  the  groups  are  different,  because  in  that 
case,  same  scores  on  the  test  do  not  give  the  same  predictions  for 
members  in  different  groups  (e.g.,  Anastasi,  1968;  Cleary,  1968; 
Guion,  1966).  This  definition  implies  that  a  test  is  biased 
(unfair)  to  Individuals  who  differ  in  the  moderator  variable  that 
accounts  for  the  regression  differences.  Linn  and  Merts  (1971) 
-further  point  out  two  major  ways  that  group  differences  can  occur: 

(a)  there  may  be  differences  in  predictor-criterion  correlations 
or  In  predictor  reliabilities;  (b)  the  regressions  may  differ  In 
slope,  intercept,  or  standard  error  of  estimates. 

It  is  important  to  emphasize  that  we  do  not  subscribe  to  this  early 
definition  of  test  bias.  A  test  may  show  different  relationships 
to  the  criterion  for  members  of  different  groups  but  still  can  be 
effectively  employed  in  a  fair  selection  procedure.  Me  quite  agree 
with  Cronbach's  (1976)  distinction  between  predictive  validity  and 
appropriate  selection  policies,  and  consider  a  predictor  battery 
to  De  Inadequate  only  If  it  results  in  different  selection  (or 
classification)  efficiency  after  the  different  relationships  with 
the  criteria  among  groups  have  been  incorporated  into  the 
selection  system. 

Because  predictive  validity  is  likely  necessary,  though  not 
sufficient,  for  effective  selection,  we  deem  it  fruitful  to 
investigate  it  by  means  of  moderators. 

There  are  several  ways  for  Identifying  moderators  that  can  be  used 
to  Improve  predictions.  The  three  methods  most  frequently 
employed  In  empirical  research  are  Uedeck,  1971): 

(a)  Subgroup  Analysis.  Moderators  are  identified 

through  comparisons  of  predictions  for  different 
groups  (frederiksen  and  Melville,  1954); 

(b)  Prediction  of  Predictabi 1 ity.  Correlates  of  the 
absolute  difference  scores  (the  0  scores)  between 
standardized  criterion  and  standardized  predictor 
scores  are  identified  and  used  as  'predictors  of 
predictabi 1 ity'  (Ghiselli,  1955;  1960);  and 

(c)  Moderated  Multiple  degressions.  Multiplicative 

( cross-product)  terms  are  introduced  into  the 
multiple  regressions  as  new  predictors  (Saunders, 

1955;  1956). 


1-29 


While  all  of  these  metnods  have  Deen  employed  to  examine  moderating 
effects  in  validation  research,  the  results  using  these  techniques 
frequently  differ,  further,  moderators  are  not  consistently  sub¬ 
stantiated  in  replications.  This  prompted  Ghiselli  (1972)  to  remark 
"it  Is  possiole  that  moderators  are  as  fragile  and  elusive  as  that 
other  wi  1 1-o-the-wisp,  the  suppressor  variaDle".  We  recognize  tne 
limitation  of  the  moderator  approach,  but  regard  it  as  a  useful 
concept  In  the  present  validation  effort  to  support  the  develop¬ 
ment  of  a  fair  selection/classification  system  for  the  Army. 

In  our  opinion,  the  difficulties  in  moderator  research  lie  in  the 
exploratory  nature  of  the  efforts,  for  this  project,  we  shall 
take  a  confirmatory  approach.  We  do  not  intend  to  engage  in  a 
fishing  expedition  to  discover  moderators.  Instead,  our  research 
of  moderating  effects  will  follow  explicit  rationales.  Specifi¬ 
cally,  sex  and  race/ethnicity  are  two  variables  we  will  examine 
closely  for  moderating  effects  because  the  law  explicitly  prohibits 
discriminations  with  regard  to  these  characteristics.  In  addition, 
we  will  conduct  a  thorough  review  of  the  literature  in  personnel 
selection  and  consult  with  personnel  experts  in  the  Army  and  our 
research  advisory  panel  to  determine  other  candidate  moderators 
and  estaolisn  clear  justifications  for  a  need  to  examine  them, 
(for  example,  in  recent  years,  there  seems  to  be  growing  concern 
about  equal  opportunities  for  children  from  bilingual  and  rural/ 
urban  backgrounds;  there  are  also  indications  that  high  school 
graduation  may  also  exhibit  moderating  effects,  [Sims  &  Hiatt, 
1981].)  For  these  special- interest  groups,  we  will  carry  out 
subgroup  analyses  in  order  to  assure  that  there  are  no  differ¬ 
ential  validities  among  the  groups. 

At  the  present,  we  anticipate  that  the  potential  moderators  tnat 
we  will  investigate  are  to  be  qualitative  (discrete)  variables. 
Since  natural  groups  can  be  formed  in  these  cases,  we  can  perform 
subgroup  analysis  in  order  to  evaluate  the  effects  of  each 
potential  moderator  independently,  and  jointly  if  joint  moderating 
effects  are  also  suspected.  Obviously,  we  may  not  always  have 
sufficiently  large  samples  to  support  the  analyses.  Where  a 
particular  group  is  specifically  excluded  by  law  from  certain  jobs 
(for  example,  women  can  not  be  assigned  to  dOS  such  as  comoat 
engineer  and  tank  crewman),  the  issue  of  unfair  selection  for  this 
group  cannot  be  addressed.  However,  where  assignment  is  permitted 
out  tnere  simply  have  not  been  a  sufficient  number  of  qualified 
applicants  from  some  special  interest  groups,  we  will  investigate 
alternative  ways  to  conduct  the  evaluation  of  moderating  effects 
for  such  cases,  because  tne  selection  of  rtOS  for  data  collection 
in  Task  5  has  specifically  taken  sample  sizes  into  consideration, 
we  currently  estimate  that  with  few  exceptions,  we  will  have 
adequate  samples  to  perform  the  analysis  of  simple  moderator 
effects,  at  least  for  these  MOS,  out  not  necessarily  for  joint 


I  ! 


I  0 

#  *  % 

I  _ 

i  , 

(  ; 

j 

!  I 

!  c 


4 


4 


effects.  If  preliminary  analyses  uncover  evidence  for  important 
joint  effects,  we  plan  to  obtain  pooled  estimates  of  such  effects 
by  comoining  some  MQS. 

Because  assessment  of  validity  is  primarily  based  on  the  covari¬ 
ances  between  the  predictors  and  the  criteria  (the  criteria  are 
assumed  to  have  been  put  on  comparable  scales  so  that  the  differ¬ 
entiation  is  meaningful),  we  propose  to  perform  an  initial  analysis 
employing  the  linear  structural  relation  model  (LISREL  V,  Joreskog 
&  Sorbom,  1981)  to  ascertain  the  similarity  of  the  predictor- 
criterion  relationships  among  groups.  (Linearity  is  assumed  in  the 
present  discussion;  if  nonlinearity  is  suspected,  we  will  consider 
polynomial  models  or  transf ormations  aimed  at  linearizing  the 
relationships.)  We  will  compare  not  only  the  variances- 
covariances  but  also  the  mean  vectors,  because  mean  differences 
can  also  lead  to  different  regressions  (i.e. ,  intercepts).  A  set 
of  hierarchical  hypotheses  can  be  tested  to  detect  the  exact  form 
of  differences  among  tne  groups.  If  differences  do  exist  for  one 
or  more  criteria,  we  will  perform  validation  analysis  for  each 
group  separately  in  order  to  select  a  best  set  of  composites  for 
each  group  or  we  will  make  adjustments  of  the  composite  scores  for 
one  of  the  groups.  If  group  differences  are  not  detected  during 
the  initial  analysis,  the  particular  characteristic  being  examined 
will  not  be  treated  as  a  moderator  and  will  not  be  entered  into 
further  validations. 

In  practice,  a  moderator  can  function  in  several  ways.  Group 
differences  may  be  found  with  respect  to  either  the  strength  or 
the  form  of  the  relationships  or  both.  In  addition,  a  variable 
may  bear  a  direct  relationship  with  the  criterion  (and  thus  be 
useful  as  a  predictor  by  itself),  while  at  the  same  time  influ¬ 
encing  the  relationships  between  other  predictors  and  the 
criterion.  Furthermore,  if  measurement  errors  are  present,  a 
difference  In  strength  may  be  explained  by  differences  In 
reliabilities  or  In  true  relationships  or  both.  Thus,  it  is 
possible  to  distinguish  among  various  types  of  "moderators'* 
(Sharma  et  al. ,  1981) . 

The  results  of  these  moderator  analyses  will  be  applied  to  the 
computation  of  ASVAB  composite  (area)  scores  that  are  now  the 
bases  for  selection/assignment  of  recruits  to  military  services. 
Development  of  group-specific  composites  may  be  required  in  order 
to  improve  the  efficiency  of  the  ASVAB  as  a  selection  instrument. 
If  such  composites  are  obtained,  we  will  have  to  compare  their 
utilities  with  those  of  the  earlier  composites,  so  that  appropriate 
recommendations  on  whether  to  use  separate  composites  for  special 
interest  groups  can  be  made.  On  the  basis  of  previous  validation 
(e.g.,  Maier,  1981),  we  do  not  anticipate  to  find  consistent  group 
differences  to  warrant  use  of  group-specific  composites. 


1-31 


Step  3:  Determine  the  appropriate  number  of  area  composites. 


The  number  of  area  composites  to  De  defined  will  be  determined  by  the  extent 
of  validity  generalization  across  MQS.  Accordingly,  we  will  cluster  the  M0$ 
on  the  basis  of  the  similarities  between  their  relationships  with  the 
predictors.  The  clustering  will  be  accomplished  in  three  computational 
steps. 


(1)  Compute  predictor  profiles. 

Ae  plan  to  use  linear  regression  to  produce  for  each  MOS  a 
"profile"  of  predictors  of  performance  in  the  MOS.  This  profile 
would  be  a  vector  of  standardized  regresion  coefficients, 
including  both  the  ASVA8  and  specified  additional  measures,  such 
as  level  of  education. 

(2)  eliminate  unreliable  cases. 

Before  proceeding,  we  plan  to  examine  the  results,  MQS  by  MOS,  to 
eliminate  any  outliers  or  any  profiles  that  appear  to  be  largely  a 
function  of  error  components.  This  will  involve  not  only  direct 
examination  of  the  computer  plots  of  the  residuals  but  also 
comparison  of  the  results  with  known  and  hypothesized  sources  of 
error  variance. 

(3)  Hierarchically  cluster  the  profiles. 

tfe  plan  to  use  a  procedure  such  as  the  SAS  PROC  CLUSTER  in  order 
to  identify  which  MQS  have  similar  profiles  of  performance 
predictors.  Ae  will  examine  tne  clusters  produced  for  varying 
specifications  of  the  number  of  allowable  clusters,  to  determine 
the  benefits  to  be  gained  by  adding  each  additional  cluster. 
Initially,  we  will  obtain  nine  clusters  to  check  against  tne  nine 
MQS  groups  for  wnich  current  composites  are  defined.  Because  the 
current  MQS  groups  have  oeen  in  use  for  some  time  and  they  are 
formed  on  the  basis  of  expert  judgments,  any  differences  we  found 
between  our  results  and  these  groups  will  be  carefully  analyzed  to 
explain  the  nature  of  differences.  Additionally,  we  will  also 
compare  our  clustering  results  with  other  clusters  obtained  on  the 
oas^s  of  joD/tasx  requirement  analysis  suen  as  that  being 


1-32 


performed  by  RCA  or  that  being  Investigated  In  Task  5,  If 
substantial  differences  exist,  we  may  have  to  reconcile  the 
clusters  by  trying  other  methods.  The  number  of  clusters  for  the 
best  solution  will  be  taken  as  the  number  of  composites  to  be 
defined  In  order  to  adequately  differentiate  between  the  MOS. 

Step  4;  Define  the  area  composites  for  „,ie  MOS  groups  and  set  the  cutoffs. 

The  mean  profile  of  predictors  for  each  cluster  is  the  best  overall 
predictor  of  performance  in  that  cluster  and  as  such  satisfies  the 
maximization  requirement  for  the  Area  Composite.  The  beta  weights  will  be 
transformed  to  Integer  weights,  preferrably  unit  weights  in  keeping  with  the 
traditional  practice.  The  resulting  composites  will  then  be  evaluated  In 
terms  of  their  validities  for  each  individual  M OS  within  the  group.  If 
these  validities  are  approximately  equal  to  those  obtained  oy  individual 
profiles,  we  will  suggest  adoption  of  the  composites  and  cutoffs  for 
selection  Into  the  Individual  MOS  (see  later  for  a  discussion  of  setting  the 
cutoffs) . 


Computation  of  the  mean  profile  for  a  cluster  will  be  weighted  by  the  size 
and,  if  available,  Importance  of  the  MOS,  and  MOS  that  are  close  to  tne 
ooundary  Detween  two  clusters  will  De  considered  for  logical  removal  to  a 
afferent  cluster.  Furthermore,  if  odd  clusters  of  MOS  emerge,  the 
nypotnesis  will  be  entertained  tnat  it  is  the  deficiency  of  outcome  measures 
that  led  to  a  distorter  placing  of  the  MOS.  It  is  important  for  the  Area 
Composites  to  oe  creoible,  or  the  lixclinood  of  their  proper  use  will  oe 
low;  so  all  unusual  placements  will  oe  carefully  considered. 


Careful  comparison  of  the  results  with  prior  research  and  with  current  and 
past  procedures  is  critical.  It  is  essential  that  change  not  oe  instituted 
just  for  the  sake  of  change,  because  there  are  substantial  costs, 
particularly  in  terms  of  training  for  use  of  new  procedures,  that  must  be 
factored  into  the  solution.  On  the  other  hand,  we  will  have  the  opportunity 
of  measuring  the  amount  of  information  lost  by  forcing  clusters,  rather  than 
providing  a  separate  predictor  vector  for  each  MOS.  This  will  provide  one 
small  part  of  the  justification  for  the  more  comprehensively  designed  and 
validated  enhancements  to  be  developed  during  the  remainder  of  the  project. 

As  discussed  earlier,  it  is  important  that  the  defined  composites  are  fair 
to  all  groups  when  used  in  selection  and  classlf ication.  The  area 
composites  will  be  refined.  If  necessary,  in  order  to  Insure  their 
validities  for  various  groups.  Throughout  the  subtask,  care  will  be  taken 
to  preserve  the  capability  of  analyzing  results  separately  by  race,  sex,  and 
other  key  variables.  In  particular,  once  clusters  are  defined,  the 
predictors  will  be  estimated  independently  for  individual  groups,  in  order 
to  Identify  MOS  for  which  the  selected  Area  Composite  is  relevant  for  one 
nroup  but  not  another.  Should  we  find  substantial  effects,  we  will  present 
these  results  to  advisors  who  can  evaluate  the  implications  of  various 
approacnes  to  adjusting  for  the  differences. 

Although  the  0CSPE3  has  the  final  authority  to  decide  on  the  cutoffs  for 
selection  into  an  MOS,  we  will  examine  the  effects  of  alternative  cutoffs  on 


the  selection  outcomes  In  order  to  recommend  appropriate  cutoffs  to  the 
Army.  These  cutoffs  will  be  calculated  in  terms  of  the  trade-offs  between 
failing  to  find  qualified  applicants  versus  failure  to  eliminate  future 
wash-outs.  If  possible,  we  will  propose  to  calculate  multiple  cutoffs  that 
can  be  used  upon  notification  of  a  change  in  urgency  for  filling  an  MOS. 

Step  b:  Conduct  cross-validations. 

finally,  cross-validations  will  be  carried  out  to  determine  the  extent  to 
which  the  results  represent  real  variation  in  ability  requirements  among 
MOS,  as  opposed  to  chance  variation  or  artifactual  variation  In  criterion 
measures. 

In  cross-sectional  research,  cross-validation  Is  accomplished  by  dividing 
the  available  sample  of  cases  into  one  or  more  pseudo-replicate  samples. 
Tne  simplest  design  divides  the  available  sample  Into  two  halves  randomly, 
develops  the  equation  on  one  half,  and  cross-validates  it  on  the  other. 
This  is  in  fact  the  classical  cross-validation  design.  When  the  available 
sample  is  sufficiently  large  to  permit  it  to  be  done  with  adequate 
precision,  the  available  sample  can  be  divided  ’nto  two  pseudo-replicates 
and  the  calculations  are  carried  out  in  both.  In  general,  the  division  into 
the  two  pseudo-rcpl icates  is  not  done  randomly,  and  thus  care  must  be  taken 
to  see  that  the  splitting  results  in  half-samples  which  are  comparable. 


Snee  (1977)  has  developed  an  algorithm  for  splitting  the  data  and  evaluating 
tne  comparaoility  of  the  two  half-samples.  This  algorithm,  as  implemented 
in  a  computer  progra*  called  DUPLEX,  divides  the  data  into  two  subsets  that 
cover  roughly  the  s«n  region  of  the  predictor  space.  The  predictor  scores 
are  standardized  and  ortnonormalized,  and  the  Euclidean  distance  between  all 
possible  pairs  of  points  is  calculated.  The  two  points  that  are  farthest 
apart  are  assigned  to  tne  estimation  set.  The  two  points  that  are  farthest 
apart  among  the  remaining  values  are  assigned  to  the  validation  set.  Then, 
the  point  that  is  farthest  from  the  two  points  in  the  estimation  set  is 
added  to  the  estimation  set;  the  point  that  is  farthest  from  the  two  points 
in  the  validation  set  is  added  to  the  validation  set;  the  point  now  farthest 
from  tne  estimation  set  is  added  to  it,  etc.  The  pattern  of  alternative 
assignments  continues  until  all  the  points  have  been  assigned  to  one  of  the 
two  sets.  The  comparability  of  the  two  sets  Is  evaluated  oy  computing  the 
ratio  of  the  determinants  of  the  inverse  of  tneir  information  matrices. 
Since  eacn  determinant  is  proportional  to  the  generalized  variance  of  the 
corresponding  predictor  space,  this  ratio  will  be  approximately  1.0  when  the 
two  half-samples  contain  roughly  the  same  amount  of  information. 

Although  the  point  nas  not  oeen  much  discussed  in  the  literature,  it  m^  be 
advantageous  to  define  multiple  sets  of  pseudo-rep  1 Icates,  and  repeat  the 
estimation  and  cross-validation  process  within  each.  Clearly,  when  the 
total  sampl*  size  is  large,  there  is  a  very  large  number  of  possible 
pseudo-rep l icaves  that  could  oe  formed,  and  a  single  pair  may  not  reoresent 
an  adequate  sample.  The  objective  is  to  ootain  the  greatest  possiole 
precision  with  the  fewest  possiDle  sets  of  pseudo-rep  1 icates. 


I 


I  0 

I  c 


McCarthy  (  1976)  has  argued  strongly  that  a  “balanced1'  sampling  plan  is  the 
most  effective.  In  particular,  he  has  argued  for  a  strategy  developed  for 
the  analysis  of  complex  sample  surveys  known  as  Balanced  Half  Sample  Repli¬ 
cations  (8HSR).  His  analyses  of  simple  cases  have  shown  that  this  method  Is 
substantially  more  efficient  than  the  traditional  split  into  random  halves. 
His  simulation  results  for  more  complex  cases  suggest  further  that  the 
advantages  of  BHSA  are  maintained  for  the  cross-validation  of  multiple 
regression  equations. 

Balanced  Half  Sample  Replication  appears  to  extract  nearly  all  the  infor¬ 
mation  available  from  all  possible  pseudo-repl Icates.  Since  the  cost  of 
BHSR  analysis  is  substantially  less  than  the  complete  pseudo-repl Icates 
analysis,  this  represents  a  considerable  advantage.  In  conjunction  with  the 
more  effective  of  the  "shrinkage"  estimation  formulas  (e.g.,  Wherry,  1931; 
Olkin  &  Pratt,  1958),  we  plan  to  use  the  work  of  Snee  to  select  and  evaluate 
naif-samples,  and  if  the  sampling  design  for  troops  is  balanced,  use  BHSR  to 
define  multiple  sets  of  half-samples  for  cross-validation. 

rurtner,  when  data  are  collected  over  several  months,  it  is  possible  to 
split  the  data  at  a  particular  point  in  time,  and  to  use  the  data  collected 
after  tnat  point  to  validate  the  model  constructed  from  the  first  half  of 
the  data.  This  type  of  design  is  useful  in  evaluating  tne  stability  of 
relationships.  As  it  will  take  several  months  to  accumulate  the  data  from 
tne  major  cohort  test,  it  is  possible  to  apply  a  form  of  rolling 


longitudinal  cross-validation  design  to  these  data,  and  to  cross-validate 
Doth  within  and  across  time.  This  will  make  it  possible  to  separate  the 
effects  of  instability  from  the  effects  of  sampling  error.  Naturally,  use 
of  such  a  longitudinal  design  is  predicated  on  the  characteristics  of  the 
accession  sample  not  varying  greatly  over  time. 

Where  feasiole,  this  approach  will  be  used  to  insure  that  the  results  are 
general izeable  to  accessions  who  enter  at  different  times  of  the  year. 
Longitudinal  analyses  of  this  kind  are  far  more  powerful  and  efficient  than 
using  longitudinal  data  to  do  repeated  cross-sectional  analyses. 

Prepare  Interim  Reoort(s).  We  expect  to  perform  this  subtask  jointly  with 
ARI  staff.  We  will  provide  weekly  reports  to  document  and  Inform  Art I  of  our 
rate  of  progress,  so  that  there  will  be  no  unexpected  failures  to  deliver 
products.  We  expect  to  present  two  major  interim  reports:  one  in  July, 
1983,  for  the  Review  Committee,  In  which  we  will  carefully  examine  the 
Implications  of  the  proposed  approaches;  and  one  in  October  1983,  in  which 
we  will  Indicate  the  range  of  modifications  that  will  be  needed  in  the  area 
composites. 

The  results  of  the  analyses  will  be  carefully  reviewed.  An  adequate  amount 
of  time  will  be  allocated  to  ensure  that  any  re-ar.alyses  as  may  be  required 
by  ARI  can  be  comfortably  carried  out,  prior  to  the  March  31,  1984,  ending 


date  for  tnls  suotask. 


Prepare  final  deport.  We  will  suomit  a  final  report  a;  the  end  of  March, 
1984,  which  contains  all  information  needed  to  judge  whether  the  use  of  new 
area  composites  for  enlisted  personnel  selection  and  classification  is 
warranted. 

Second  Analysis  of  ASVA8  Scoring  Procedures 

The  first  analysis  of  ASVA3  scoring  procedures,  the  identification  and 
validation  of  possible  new  area  composite  scores  using  fY81/82  cohort  data, 
will  be  completed  early  in  1984.  At  the  same  time,  new  data  on  the  FY83/34 
cohort  will  become  available  from  other  Project  A  tasks  for  the  19  MOS  in 
our  sample  and  work  can  begin  on  a  second  analysis  of  the  ASVA3  scoring 
procedures.  Data  on  new  predictor  and  performance  measures  will  be 
collected  on  this  cohort  between  June  and  September  of  1985.  This  second 
analysis  will  begin  to  bridge  the  gap  between  the  current  optimization 
procedure  and  computerized  optimization.  The  major  ingredient  In  the  new 
capability  will  be  the  existence  of  utility  scales  that  allow  cross-MOS 
comparisons  of  the  value  of  good  versus  poor  performance.  It  will  no  longer 
be  necessary  to  perform  assignments  based  on  the  assumption  that  all 
performance  increments  are  of  equal  value. 

This  second  analysis  will  be  undertaken  in  two  parallel  designs,  one  of 
which  replicates  In  part  the  previous  analysis  of  area  composites  and  the 
other  of  wnicn  uses  a  prototype  of  the  CPAS  optimzation  algorithm.  In  this 


way,  It  will  be  possible  to  examine  thoroughly  the  implications  of  changing 
to  the  CPAS,  in  terms  of  both  benefits  and  costs.  At  the  same  time,  the 

partial  replication  of  the  earlier  area  composite  effort  will  provide  the 
basis  for  estimating  the  stability  of  the  measured  MOS  ability  requirements 
over  time.  The  previous  method  for  developing  area  composites  will  be 
modified  to  include  the  new  utility  scales,  in  both  the  hierarchical 

clustering  algorithm  and  the  computation  of  mean  predictor  profiles  for  each 
cluster.  The  way  in  which  the  utility  scales  will  be  included  in  the  former 
is  as  a  weight  on  the  difference  between  profiles:  two  distant  MOS  should  be 
less  likely  to  be  forced  to  share  a  conmon  area  composite  if  the  importance 
of  good  performance  in  each  is  critical  than  if  good  performance  Is  not 
cri tlcal. 

In  addition  to  these  differences,  the  second  ASVA8  analysis  will  benefit 
from  the  availability  of  new  criterion  measures  being  developed  by  Project 
A.  Whi le  the  operationalization  of  the  ASVAB  scoring  procedures  will 
continue,  at  this  stage,  to  be  in  terms  of  area  composites,  these  new 

sources  of  Information  may  lead  to  more  valid  definition  of  composites. 

For  this  second  analysis  of  the  ASVA3  composites,  we  will  be  validating 

ASVAB  forms  11/12/13  by  assessing  the  discriminant  validity  of  alternative 
area  composites.  The  analysis  will  be  performed  between  December,  1986  and 
March,  1987.  A  final  report  will  be  submitted,  with  recommendations  on 
possible  revisions  of  the  ASVAB  composites  and  new  cutoffs  if  required. 


Section  2:  Support  for  the  Development  of  New/Improved  Meaaurea 


This  section  describes  the  procedures  for  Subtasks  1.2  through  1.5  in  the 
draft  Integrated  Master  Plan.  The  research  team  carrying  out  each  task 
possesses  ample  analytical  expertise  to  achieve  its  objectives.  Never¬ 
theless,  the  design  of  Project  A  has  concentrated  analytical  and  data 
processing  capabilities  in  Task  1;  and  in  order  to  maximize  the  benefits  of 
carrying  out  this  effort  as  a  single  project.  Task  1  staff  will  assist  the 
analysts  in  other  tasks  to  promote  efficiency  and  a  uniform  level  of 
statistical  and  psychometric  sophistication. 


This  assistance  is  implicit  in  tne  Research  Plans  for  the  other  tasks,  and 
the  time  lines  for  the  assistance  correspond  to  the  requirements  of  those 
tasks.  In  this  section,  we  briefly  describe  the  current  plans  for 
analytical  support  for  each  of  the  other  tasks.  Support  for  data  entry, 
editing,  and  access  for  the  LRD8  Is  described  in  the  Draft  Longitudinal 
Research  Data  Base  Plan.  Many  of  the  analyses  of  the  field  test  and  83/84 
concurrent  and  longitudinal  first  tour  sample  data  will  be  accomplished 
under  these  support  subtasks. 


Support  for  the  Development  of  Pre-induction  Predictors  (Task  2).  As  a  part 
of  the  development  of  ASVAB  area  composite  scores,  Task  1  will  Identify  MOS 
for  which  the  current  ASVAB  has  greatest  and  least  validity,  after 
appropriate  data  adjustments,  in  order  to  suggest  areas  in  greatest  need  of 


1-41 


improvement.  This  information  will  result  from  the  examination  of  the 
existing  ASVA3  composites  described  in  the  preceding  section  (Subtask  1.7). 
Preliminary  results  will  be  passed  to  Task  2  in  August,  1983  for  use  in 
their  planning  for  the  development  of  new  measures. 

In  planning  for  analyses  to  support  the  development  of  measures.  Task  1  will 
assist  in  sampling  design  for  the  field  tests,  in  the  development  of 
approaches  for  Computer-Assisted  Testing  (CAT),  and  in  testing  of 
assumptions  about  the  comparison  of  construct  validity  of  different 
predictors. 

Task  1  will  focus  particular  attention  on  the  problems  and  potential  of  CAT 
and  will  attempt  to  provide  staff  of  Task  2  with  knowledge  gained  as  a 
function  of  validation  of  the  CAT  forms  of  the  ASVAB .  Integration  of  CAT 
with  tailored  testing  and  with  latent  trait  theory  will  require  significant 
research  efforts,  and  Task  1  will  monitor  the  results  of  research  cn  CAT 
being  carried  out  elsewhere. 

Field  tests  of  new  pre-induction  predictors  will  be  carried  out  in  1983/84 
with  the  Preliminary  Predictor  Battery  and  in  1985  with  the  Trial  Predictor 
Battery.  Task  1  staff  will  enter  the  data  into  the  LAOS  and  then  merge 
these  data  with  various  criteria  In  order  to  Identify  those  prediclors  that 
contribute  the  greatest  Increment  to  differential  validity  for  MOS 
assignment.  In  order  to  meet  Task  2  needs  for  creation  of  the  Experimental 
Predictor  Battery,  preliminary  results  of  the  analysis  of  the  FY83/S4 


first-tour  concurrent  validation  date  must  be  made  available  to  Task  2  by 
December,  1985. 

Task  1  staff  will  provide  Task  2  with  edited  data  for  their  analyses  of  the 
psychometric  properties  of  the  predictor  measures  and  will  assist  In  these 
analyses.  Task  1  staff  will  similarly  assist  the  other  tasks  in  the 
analysis  of  the  psychometric  properties  of  the  various  criterion  measures. 

Task  1  staff  will  then  perform  preliminary  validation  analyses  to  provide 

Task  2  with  information  on  the  contribution  to  overall  predictive  validity 

made  by  each  predictor  measure  and  Information  on  any  problems  associated 

with  the  differential  validity  of  particular  measures  across  gender  and 
race/ethnic  groups.  These  analyses  are  mentioned  in  Subtask  9  of  the  Task  2 
research  plan.  Our  approach  to  specific  Issues  In  these  analyses  is 
described  in  the  following  section  (Section  3)  on  the  Validation  of  New  and 
Improved  Predictors  of  Army  performance. 

The  support  for  Task  2  will  be  facilitated  by  the  inclusion  of  staff  at  the 
site  carrying  out  Task  2  (i.e. ,  Minneapolis)  as  a  part  of  the  Task  1  staff. 

Support  for  the  Development  of  Training  Outcome  Measures  (Task  3).  The 
initial  analyses  aimed  at  development  of  ASVAB  composites  will  be  based  on 
the  extent  to  which  ASVAB  subtests  predict  existing  training  outcome 
measures  in  different  MOS.  In  the  course  of  these  analyses.  Task  1  staff 
will  rely  on  information  gathered  oy  Task  3  in  their  evaluation  of  the 
existing  measures  (Subtask  3.2).  In  turn,  the  use  of  these  data  as 


criterion  measures  will  also  identify  strengths  and  weaknesses  in  current 
training  outcome  measures  that  will  be  useful  to  the  staff  of  Task  3  in 
their  evaluaiton  of  these  measures. 

In  planning  for  analyses  for  Task  3,  the  staff  of  Task  1  will  assist  in 
sampling  design  and  in  the  identification  of  particular  confounding  factors 
that  affect  training  outcomes,  including  variations  between  training  sites. 
In  carrying  out  the  concurrent  validation  on  the  FY83/84  cohort.  Task  1  will 
compare  Task  3  measures  with  those  being  considered  in  Tasks  4  and  5  in 
order  to  develop  hypotheses  about  the  structure  of  the  criterion  space. 
Similarly,  the  structural  models  tested  in  the  concurrent  validation  will 
provide  essential  Information  on  the  use  of  training  measures  as  predictors 
of  subsequent  performance.  This  information  will  be  fed  back  to  Task  3 
staff  to  aid  in  their  conduct  of  Subtask  3.8. 

The  proximity  of  the  sites  at  which  Tasks  1  and  3  are  neadquartered,  in  Palo 
Alto  and  Carmel,  California,  will  facilitate  communication  between  these 
Tasks. 

Support  for  the  Development  of  Army-wide  Criteria  (Task  4).  In  the  course 
of  analysis  of  FY81/82  enlisted  Master  File  data  for  the  purpose  of  ASVA6 
validation.  Task  1  will  perform  numerous  descriptive  analyses  of  existing 
coir^uterized  Army-wide  criterion  measures.  This  descriptive  information, 
along  witn  information  on  data  quality  gleaned  from  the  editing  of  these 
data  is  being  passed  on  to  Task  4  staff  to  aid  in  Subtask  4.1.  In  aadition, 


1-44 


Task  1  will  identify  representative  samples  of  r YS1  accessions  for  use  by 
Task  4  in  searching  noncomputerized  (microfiche)  records  and  will  enter  the 
resulting  information  merging  it  with  other  available  measures  for  analysis 
by  Task  4  staff. 

In  planning  for  analyses  for  Task  4,  the  staff  of  Task  1  will  assist  in 
scaling  and  aggregation  problems  and  will  focus  particular  attention  on  the 
development  of  utility  measurement  procedures.  The  translation  of  perfor¬ 
mance  outcomes  into  a  single  utility  scale  which  is  comparable  across  MOS  is 
vital  to  the  derivation  of  the  prediction  equations  that  must  be  passed  to 
Project  3  if  CPAS  Is  to  become  a  reality.  At  the  same  time,  the  development 
of  appropriate  utility  scales  is  one  of  the  most  methodologically  complex 
tasks  in  this  project.  The  data  necessary  for  the  development  of  utility 
scales  will  be  collected  under  the  auspices  of  Task  4  by  Task  5  field 
personnel.  Task  1  staff  will  input  to  the  planning  of  this  data  collection 
effort  and  will  support  analyses  of  the  resultant  data  to  derive  utility 
scales  that  are  comparable  across  MOS.  Task  4  will  perform  the  difficult 
and  critical  task  of  preparing  appropriate  stimuli  for  scaling  that 
accurately  reflect  the  domain  of  performances  to  be  scaled  and  preparing 
instructions  for  eliciting  valid  judgments  of  relationships  among  the 
stimuli.  A  description  of  the  procedures  for  developing  accurate  stimuli 
and  a  valid  judgment  paradigm  Is  given  under  Subtask  9  of  Task  4. 

Field  tests  of  new  first-tour  Army-wide  criteria  will  be  carried  out  in  1984 
and  1985,  and  Task  1  will  assist  in  the  incorporation  of  the  data  resulting 


1-45 


from  these  field  tests  into  tne  lAJ8,  the  analysis  of  the  relations  of  these 
measures  to  MQS-specific  measures,  and  the  analysis  of  the  different  ways 
these  two  sets  of  criteria  relate  to  measures  taken  at  induction  or  during 
training. 

Second-tour  Army-wide  criteria  will  be  field-tested  in  1986,  and  these  data 
will  be  analyzed  to  identify  predictors  that  should  be  taken  into  account  at 
the  time  of  making  reenlistment  decisions. 

The  coordination  of  efforts  of  Task  1  and  Task  4  will  be  facilitated  by  the 
location  of  some  staff  of  both  tasks  in  the  same  office  in  Alexandria, 
Virglni a. 

Support  for  the  Development  of  MOS-soedflc  Criteria  (Task  5).  An  Initial 
effort  confronting  Task  5  has  been  the  selection  of  MOS  for  special 
criterion  development  that  will  be  representative  of  the  entire  range  of  MOS 
in  the  Army.  The  strategy  has  been  to  Identify  clusters  of  "similar"  MOS, 
so  that  MOS  can  be  sampled  in  such  a  way  that  every  Army  MOS  is  in  the  same 
cluster  as  at  least  one  of  the  sampled  MOS.  As  a  part  of  the  ASVA6  area 
composite  score  development  subtask,  Task  1  will  Identify  clusters  of  MOS 
with  similar  profiles  of  pre-induction  performance  predictors.  Task  1  will 
also  perform  cluster  analyses  of  MOS  job  content  and  requirement  ratings. 
The  results  of  these  analyses  will  either  confirm  the  adequacy  of  the  MOS 
sample  initially  selected  or  suggest  refinements  of  it. 


Planning  for  TasK  5  analyses  will  Include  sampling  design;  selection  of 
jobs,  tasks,  and  duty  positions  within  MQS  to  maximize  the  general izability 
of  the  results;  and  the  design  of  administration  procedures  to  support 
estimation  of  reliability  and  validity  of  MOS-specific  criteria.  Task  1 
will  assist  in  this  effort  in  response  to  requests  from  Task  5. 

As  described  under  Activity  5.5.5,  Task  1  staff  will  assist  Task  5  in  the 
analysis  of  field  test  data.  A  major  component  of  this  assistance  will  be 
In  the  implementation  of  USftEl  V  models  for  assessing  construct  validity 
and  criterion  equivalence.  This  approach  involves  the  confirmation  of 
constructs  hypothes 1  zed  to  underlie  the  observed  measures  and  the  assessment 
of  possible  measurement  bias  from  the  use  of  different  types  of  instruments. 
Such  models  will  be  used  on  a  larger  scale  In  the  concurrent  validation  on 
the  f Y83/84  cohort  to  assess  the  extent  of  criterion  equivalence  across  the 
Task  4  and  5  (and  to  a  lesser  extent  Task  3)  measures. 

The  coordination  of  efforts  between  Tasks  1  and  5  will  be  facilitated  by  the 
location  of  some  staff  of  both  tasks  In  the  same  office.  In  Washington,  D.C. 

Subtasks  1.2  through  1.5  described  above  provide  Project  A  with  the 
flexibility  needed  to  respond  to  unexpected  developments  and  problems  In 
various  areas.  While  focusing  our  effort  on  the  validation  of  existing 
measures  and  the  longitudinal  predictive  validation  of  new  measures,  we  can 
allocate  resources  to  assist  one  or  another  of  the  Instrument  development 


1-47 


I 


I 


J 


<4 

\ 


t 


tasks,  as  the  needs  arise.  For  example,  programming  support  may  be  provided 
to  Task  2  for  the  development  of  computer-mediated  psychomotor  tests. 

At  the  same  time,  by  maintaining  careful  documentation  of  files  and 
project-wide  availability  of -particular  -analytical  procedures.  Task  1  can 
eliminate  many  repetitive  searches  for  similar  data  files  and  similar 
analytical  packages.  In  this  way.  Task  1  will  contribute  to  the  cohesion  of 
the  project  as  a  whole. 


£ 


i? 

i 


H 


* 


B 

F-l 


1-48 


1 


tKl' 


The  subtask  can  be  divided  into  three  efforts:  (a)  validation  of  the 
pre-induction  measures  used  for  initial  MQS  assignment;  (b)  validation  of 
the  training  measures  for  use  In  re-assignment  following  training;  and  (c) 
preparation  of  Inputs  for  the  CPAS.  The  second  tour  FY83/84  cohort 
performance  data  Including  final  first  tour  attritions*  eligibility*  and 
reenlistment  rates  will  be  analyzed  along  with  the  first  tour  FY86/87  data. 
However,  we  will  not  Include  the  second  tour  performance  measures  for  the 
FY86/87  cohort  at  this  point,  because  they  will  not  become  available  within 
the  current  time-frame  for  this  subtask.  A  fourth  analysis  involving  the  use 
of  first- tour  performance  measures  In  determining  re-enlistment  and 
attrition  rates  for  the  FY86/87  cohort  will  be  made  In  1991  under  combined 
task  efforts. 

Validation  of  Procedures  for  Initial  Selection  and  MQS  Classification 


This  subtask  will  be  carried  out  In  ten  steps,  over  a  54-month  period, 
starting  in  April,  1985.  The  ten  steps  are  as  follows: 

(1)  Set  objectives; 

(2)  Determine  the  data  collection  and  cross-validation 
designs; 

(3)  evaluate  and  revise  objectives.  If  necessary; 

(4)  Acquire,  check,  and  clean  the  data; 

(5)  Carry  out  preliminary  analyses; 

(6)  Plan  and  execute  the  main  validation  analyses; 


(7) 

Evaluate  the  cultural  fairness  of 
procedures; 

the  proposed 

(3) 

Prepare  interim  reports; 

(9) 

Make  final  Inputs  to  Project  B  allocation 

algorithm; 

(10) 

Prepare  the  final  report. 

The  following  sections  describe  our  present  thinking  with  regard  to  each  of 
these  steps. 

Step  1.  Set  objectives. 

The  objectives  must  be  thoroughly  and  operationally  defined  in  advance  of 
the  troop  requests,  so  that  an  adequate  data  base  will  be  assured. 
Therefore,  the  plans  presented  here  must  be  updated,  starting  early  in  1985, 
so  that  they  can  be  reviewed  by  fiAl  with  sufficient  time  for  revision  and 
refinement  before  presentation  as  part  of  the  justification  for  the  troop 
support  request  In  May,  1985.  (This  troop  support  request  will  cover  the 
administration  of  the  Experimental  Predictor  Battery  to  the  sample  [possibly 
revised]  MOS  In  the  FY86/87  cohort.) 

Step  2.  Determine  the  sampling  and  cross-validation  designs. 

Unlike  the  earlier  validation  of  the  ASVA8  using  existing  cohort  data,  this 
validation  will  require  the  administration  of  new  instruments  to  a  sample  of 
recruits.  As  a  result,  the  data  base  will  have  substantially  fewer  memoers, 


1-51 


although,  we  hope,  the  data  on  each  recruit  will  be  substantially  richer  and 
the  numbers  within  selected  MQS  will  be  comparable  to  the  numbers  available 
for  the  earlier  work.  It  is  essential  that  adequate  numbers  of  women  and 
minorities  be  Included,  so  that  the  effectiveness  of  the  CPAS  for  each  group 
can  be  evaluated. 

A  average  of  2,200  trainees  from  each  of  the  19  focal  MQS  will  be 
administered  the  experimental  predictor  battery  as  they  enter  AIT.  The 
numbers  administered  the  first-tour  performance  measures  will  be  about  25-40 
percent  of  the  Initial  sample  size  because  only  12  to  15  Army  sites  will  be 
visited  to  collect  first-tour  performance  data.  In  addition,  a  sizeable 
percentage  of  enlisted  personnel  will  either  have  left  the  service  or  be 
otherwise  unavailable  for  performance  testing  during  the  site  visit  period. 
As  In  the  case  of  the  FY83/84  cohort,  enlisted  personnel  In  the  FY86/87 
cohort  who  reenlist  will  be  administered  second- tour  performance  measures. 

Prior  to  the  analyses,  the  balanced  half-sample  pseudo-repllcates  will  be 
defined.  They  will  be  used  In  the  main  validation  analyses  for  the  purpose 
of  cross-validation. 

Step  3:  evaluate  and  Revise  Objectives.  If  Necessary. 

Based  on  the  troop  availability  and  the  budget  for  test  administration,  it 
may  be  necessary  to  limit  the  scope  of  the  validation.  Or  he  other  hand, 
if  the  results  of  the  MOS  clustering,  FY81/82  file  data  simulations,  and 


similarity  scaling  are  promising  (see  Validity  Generalization  section,  pages 


26-31),  it  may  be  possible  to  expand  the  analyses  to  simulate  the 
application  of  the  algorithm  to  a  larger  set  of  MOS.  The  Impact  of  MOS 


subgroups  heterogeneity  on  the  applicability  of  regression  equations  will 
also  be  explored  using  the  FY86/87  data  (see  page  29).  The  results  of  these 


as  well  as  other  analyses  could  lead  to  a  revised  set  of  objectives  based 


upon  what  the  preliminary  analyses  indicate  can  be  accomplished  with  the 


data.  In  this  regard,  the  collection  and  analysis  of  file  data  from  other 
MOS  (in  addition  to  the  19)  may  be  deemed  warranted. 


Step  4:  Acquire,  check,  and  clean  the  data. 


Daca  will  be  acquired,  edited,  and  entered  into  the  IR03  in  accordance  with 


the  procedures  outlined  in  the  Longitudinal  Research  Database  Plan.  The 
data  will  be  checked  for  duplicate  records  and  obvious  keypunch  errors. 


screened  for  outliers  and  Implausible  values  using  relational  edits.  Imputed 


using  a  statistical  algorithm,  PROC  IMPUTE,  where  necessary,  and  entered 


Into  the  LRD8.  Further  details  on  our  editing  procedures  are  found  In  the 


LR08  Plan. 


Step  5:  Carry  out  preliminary  analyses  on  predictors  and  criteria. 


Sasic  descriptive  statistics  on  the  predictors  and  criteria  will  be 


computed.  From  this  Information,  corrections  for  restriction  of  range  and 


unrellaoi 1 1 ty  (where  appropriate)  will  be  computed  using  the  methodology 


mum 


discussed  in  the  section  on  the  development  of  ASVA3  composites.  After 
corrections,  the  appropriate  descriptive  statistics  will  be  recomputed 
(e.g.,  correlations  after  restriction  of  range  correction) .  Composite 
scores  will  be  added  to  the  file  where  appropriate. 

Step  6:  Plan  and  execute  the  main  validation  analyses. 

I 

The  longitudinal  validation  will  synthesize  all  that  has  been  learned  in  the 
course  of  the  project,  both  in  terms  of  new  measurement  procedures  and  in 
terms  of  methodological  issues.  We  envision  an  analysis  strategy  consisting 
of  a  series  of  eight  steps,  as  follows: 


Step  6A: 


Express  the  criterion  measures  in  terms  of  expected  utility  to  the 


The  measurement  of  the  differential  utility  to  the  Army  of  good  versus  poor 
performance  on  the  critical  tasks  in  each  selected  MOS  is  an  essential 
component  of  the  development  of  the  CPAS.  The  only  way  in  which  assignments 
can  be  optimized  is  through  comparison  of  the  “payoff"  to  the  Army  for 
having  the  recruit  assigned  to  this  or  that  MOS.  Payoffs  resulting  from 
good  performance  must  be  measured  on  the  same  scale  for  each  MOS,  and  it  is 
that  scale,  or  numerical  assignment,  that  we  are  referring  to  as  "utility." 
Task  1  will  translate  the  utility  data,  collected  by  Task  5  personnel,  into 
the  common  scales  needed  for  the  selection  and  classification  system  and  for 


1-54 


the  validation  effort,  following  the  procedures  developed  in  Subtask  4,9. 
MOS  utilities  for  second  tour  performance  levels  will  be  applied  to  the  FI 
83/84  data. 

Step  63:  Determine  the  conditions- under  which  the  validation  analyses  will 
be  run. 

The  optimization  algorithm,  or  an  approximation  to  it,  will  be  used  to 
evaluate  the  expected  results  of  computerized  allocation.  This  algorithm 
can  then  be  executed  under  a  variety  of  conditions  to  test  the  sensitivity 
of  the  algorithm  to  different  factors.  These  include:  (a)  omitting  a  subset 
of  the  predictor  measures,  (b)  changing  particular  recruiting  strategies 
(such  as  bonuses),  and  (c)  changing  relative  utilities  In  different  MOS.  In 
the  main  validation  analyses,  the  effect  of  these  conditions  on  classi¬ 
fication  efficiency  will  be  determined.  Predictors  will  be  evaluated  in 
terms  of  increments  In  expected  utility. 

Step  6C:  Find  linear  (and  perhaps  non-linear)  combinations  of  predictors 
which  maximally  predict  expected  utility. 

3oth  linear  and  non-linear  scoring  procedures  for  the  predictor  instruments 
will  be  sought  which  will  maximally  predict  the  utility  criteria.  An 
expected  value  will  be  obtained  for  each  recruit  in  each  rfQS,  and  these  will 
be  aggregated  as  appropriate  for  use  in  the  Project  3  algorithm. 


1-55 


Step  SO:  find  Bayesian  (or  Empirical  Bayes)  combinations  of  predictors* 


iayesian  simultaneous  estimators  often  show  greater  stability  than  least 
square  estimators  (which  maximize  predictability  In  the  sample).  They  also 
tend  to  have  smaller  mean  square  error,  particularly  when  prior  Information 
about  small  subgroups  can  be  Incorporated  Into  the  estimates. 

Step  6E:  Compute  the  coefficient  matrix  for  each  condition  (as  defined  in 
SSpgT - * -  - 

We  shall  then  compute  the  aggregate  expected  utility  estimate  for  each 
condition.  The  results  of  various  alternative  optimizations  will  be 
compared  with  each  other  and  with  the  best  current  procedures  for  selection 
and  classification.  Using  either  linear  programming  or  the  method  agreed 
upon  at  that  point  in  time,  we  shall  compute  the  Increments  in  utility  that 
accrue  as  a  result  of  each  addition  to  the  prediction  system. 

The  battery  will  be  evaluated  In  terms  of  classification  efficiency  by 
implementing  the  allocation  algorithm  to  simulate  training/job  assignments 
for  the  FY86/87  recruits.  The  Improvement  in  classification  efficiency  as  a 
result  of  basing  decisions  on  the  CPAS  serves  as  the  final  justification  for 
its  full-scale  Implementation.  In  order  to  ensure  the  long-term 
effectiveness  of  the  CPAS,  the  validations  will  be  repeated  by  varying  the 
system  parameters  to  represent  a  number  of  plausible  conditions  under  which 
the  CPAS  will  be  applied. 


1-56 


Clearly,  this  evaluation  of  the  practical  value  of  the  CPAS  needs  to  also 
take  Into  account  the  cost  of  Its  Implementation.  The  Intermediate 
validation  process  and  the  cost-benefit  analysis  in  tne  course  of  the 
predictor  development  are  aimed  at  insuring  the  practical  utility  of  the 
CPAS.  Nevertheless,  at  this  final  stage  of  validation,  we  shall  make  an 
Integrated  assessment  of  the  system  by  simultaneously  considering  the  cost 
and  the  gain  In  utility. 

Step  6 F:  Determine  the  extent  to  which  the  inclusion  of  training  measures 
improves  classif fcatfon  efficiency. 

This  validation  will  closely  parallel  the  preceding  plan,  adding  training 
measures  to  pre-induction  measures  as  predictors  and  using  first-term 
performance  as  criterion.  These  analyses  will  address  the  costs  and 
benefits  of  reclassification  following  training.  The  need  for  this 
validation  will  be  determined  In  the  course  of  the  project,  as  the  potential 
value  of  Improved  prediction  for  assisting  in  assignment  decisions  following 
training  is  clarified.  This  value  will  depend  on  a  complex  combination  of 
scheduling  constraints  which  limit  the  flexibility  of  reassignments 
following  training. 

Step  oG:  Evaluate  the  adequacy  of  the  training  and  Army-wide  criterion 
measures  as  stand  alone  criteria. 

To  enable  tne  Army  to  update  the  system  feasibly.  It  is  important  that  proxy 
measures  for  the  detailed  and  expensive  rtOS-specif ic  performance  measures  be 


shown  to  be  valid  when  used  without  the  specific  measures,  doth  training 
outcomes  and  Army-wide  criteria  will  oe  used  in  the  algorithm  to  determine 
the  extent  to  which  different  personnel  allocations  would  occur  if  validity 
parameters  were  based  solely  on  criterion  measures  that  did  not  include 
MO$-$pec1f 1c  performance  measures.  In  addition,  the  impact  of  using  the 
less  expensive  MOS- spec If 1c  measures  (ratings  and  job  Knowledge  tests)  but 
not  hands-on  measures  on  the  allocation  outcomes  will  also  be  examined. 

Step  6H:  Carry  out  the  cross-validation  and  stability  analyses. 

Task  1  will  carry  out  the  main  validation  analyses  in  accordance  with  the 
balanced  naif-sample  cross-validation  design  cnosen  in  Step  2.  This 
involves  repeated  calculations  in  the  half-samples  specified  in  the  design 
of  the  longitudinal  validation.  In  addition,  stability  analyses,  which  are 
described  in  detail  below,  will  be  done  using  both  the  FY83/84  and  FY86/87 
databases. 

It  would  be  desirable  if  the  predictor-criterion  relationships  cculd  be 
shown  to  oe  stable  over  changes  in  time  and  conditions.  Then  the  findings 
in  tne  current  validation  could  be  continuously  applied  to  provide  the  basis 
for  predictions  of  jod  performance,  while  the  supply  requirement  changes 
would  be  accommodated  by  modifying  the  constraints  in  the  allocation 
algorithm.  It  is  therefore  important  that  we  investigate  the  extent  to 
whicn  predictive  relationships  and  utility  structure  change  over  time  and 
conditions.  Moreover,  we  need  to  determine  the  related  factors  causing  the 


1-58 


changes  so  that  we  can  anticipate  the  changes  and  plan  for  appropriate 
modifications  to  the  allocation  system. 

The  stability  of  the  factor  structure  and  Interrelationships  among  measures 
will  be  assessed  by  comparing  the  analytic  results  between  FY83/84  and 
FY86/87  cohorts.  With  regard  to  the  stability  of  utility  measures.  Task  4 
plans  to  repeat  the  data  collection  on  utilities  from  a  different  group  of 
Army  staff  and  at  different  times  (Summer  of  1985  and  1988).  These  utility 
data  will  be  analyzed  to  assess  the  stability  of  the  estimation  procedure. 

We  propose  to  conduct  the  stability  analysis  In  two  ways.  The  relationships 
will  first  be  examined  with  conventional  statistical  methods  In  terms  of 
means,  and  covariances.  For  the  stability  of  factor  structures,  various 
stability  coefficients  have  been  suggested  In  a  slightly  different  context 
—  that  of  factorial  Invariance  across  populations,  and  across  measurement 
Instruments/tests  (uawley  &  Maxwell,  1971;  Meredith,  1964;  and  Mulalk,  1972). 
These  Indices  can  be  extended  to  the  present  problem.  However,  we  plan  to 
address  the  Issue  of  stability  primarily  by  analyzing  the  data  from  the 
FY33/84  and  86/87  cohorts  simultaneously  with  the  IISREL  model  (Joreskog, 
1971;  McGaw  &  Joreskog,  1971;  Sorbom,  1974;  1978;  1981;  and  Joreskog  & 
Sorbom,  1980). 

The  simultaneous  analysis  of  multi-sample  data  allows  us  to  examine  In 
detail  how  the  relationships  and  measurement  structures  differ  between 
samples.  Mot  only  we  will  be  able  to  determine  the  degree  to  which  the 


i 


-59 


varlances-covarlances  and  means  differ  across  samples  (e.g.,  population  over 
time),  but  we  will  also  be  able  to  test  the  stability  of  regression 
coefficients  directly  with  the  model.  Here  the  Important  question  Is 
whether  analyses  of  two  different  samples  leads  to  the  selection  of  the  same 
tests  and'  weight  then  in  substantially  the  same  way  for  each  “cluster  of 
MOS.  If  we  find  that  the  relationships  differ,  we  will  search  for  the 
explanation  of  these  differences  keeping  In  mind  the.  concurrent  versus  _ 
predictive  nature  of  the  two  cohort  validations.  The  factors  that  are 
related  to  the  changes  of  relationships  will  be  Identified  so  that  updates 
of  the  prediction  models  and/or  the  classification  Instruments  may  be 
planned. 

While  the  examination  of  the  predictive  relationships  and  measurement 

structures  are  Informative  because  they  constitute  the  foundations  of  the 
allocation  system,  the  direct  test  of  the  stability  of  the  performance  of 
the  CPAS  rests  on  actual  implementation  of  It  under  varying  conditions.  For 
this  purpose,  we  will  further  carry  out  the  evaluation  of  stability  by 

Implementing  the  allocation  algorithm  with  different  prediction  models 
obtained  with  different  components  of  the  available  data  bases  and  compare 
the  assignment  outcomes  In  terms  of  classification  efficiency.  In 

conducting  this  evaluation,  we  will  also  simultaneously  vary  the  system 

par^ieters  to  examine  the  effects  on  the  allocation  outcomes  of  changes  In 
priority,  political  decisions,  and  social/  economical  conditions  (some  of 
these  changes  will  require  modifications  of  specific  constraints,  some  may 
simply  require  changes  In  utility  structure) . 


Again,  where  tne  changes  In  relationships  and/or  system  parameters  appear  to 
affect  the  performance  of  the  allocation  system,  we  will  attempt  to  identify 
factors  that  account  for  such  effects  so  that  we  can  estimate  the  frequency 
with  which  the  CPAS  will  require  updating.  In  making  this  estimate,  we 
shall  also  take  Into  consideration  the  costs  of  updates  and  the  benefits  to 
be  gained  from  the  resulting  better  allocation  outcomes. 


These  stability  analyses  will  produce  a  summary  of  the  findings  with  special 
emphasis  on  those  relationships  that  are  most  likely  to  change  with  time  and 
social /economical  conditions,  and  the  factors  that  are  likely  to  cause  such 
changes.  4e  will  also  make  recommendations  on  the  update  frequency  for  the 
CPAS  and  ways  of  updating  the  system. 


Evaluate  the  cultural  fairness  of  the  classification  procedures. 


The  differential  validity  and  reliability  across  racial  and  sex  groups  (and 
maybe  other  groups  of  special  Interest)  will  follow  the  procedures  described 
earlier  In  conjunction  with  the  validation  of  existing  predictors 
£  (Section  1).  However,  the  ultimate  objective  of  the  present  endeavor  Is  to 
Insure  cultural  fairness  with  reference  to  classlf Icatlon  efficiency.  The 

‘;*S 

i*  Investigation  of  moderating  effects  employing  the  concept  of  differential 
validity  and  reliability  Is  a  screening  step.  Having  Identified  variables 
assignment  outcome  appear  to  have  significant  moderating  effects  on 
jj  predicting  differential  job  performance  and  having  obtained  estimates  of  the 
moderated  relationships  between  predictors  and  criteria,  our  next  step  Is  to 


apply  these  results  to  making  assignments  of  Individuals  to  jobs.  The 
results  of  such  assignments  can  then  be  evaluated  for  the  effectiveness  of 
employing  moderators  to  assure  fairness  of  a  selection/classification 
procedure.  That  Is.  we  want  to  Identify  for  blacks  as  well  as  whites,  and 
for  women  as  we  IT  as  men,  those  partlcular  MQS  where  they  can  expect  to  make 
the  greatest  use  of  their  abilities  to  contribute  to  the  Army's  mission  and 
thus  to  progress  most  rapidly  In  a  career  ladder. 

The  same  assignment  procedure  but  using  separate  prediction  equations  (l.e. , 
using  different  scoring  formulas  or  different  predictor  sets)  will  be 
applied  to  each  of  the  groups  that  are  of  special  concern.  Classification 
efficiency  will  be  estimated  from  the  assignment  outcome  and  compared  with 
the  outcome  obtained  without  consideration  of  different  predictions  across 
groups.  In  doing  this,  we  will  mostly  conduct  simulations  using  the 
available  data  base  and  assignment  procedure.  If  for  some  reason  the  data 
base  Is  not  adequate  In  supporting  such  simulations,  we  will  supplement  It 
with  computer  generated  data. 

As  the  use  of  moderators  Is  expected  to  Improve  predictions  In  general,  we 
are  Interested  In  examining  the  accompanied  Improvement  of  overall 
classification  efficiency  for  the  applicant  population  as  a  whole.  That  Is, 
we  will  compare  the  assignment  outcome  using  moderated  relationships  with 
tnat  using  common  relationships  for  all  groups,  decause  the  Introduction  of 
moderators  Into  the  allocation  system  Incurs  additional  costs  both  in  data 
collection  and  In  algorithmic  labor  (computations  and  data  management  etc.), 


we  need  to  evaluate  the  practical  value  of  each  moderator  for  Its  ability  to 
Improve  the  fairness  as  well  as  the  overall  efficiency  of  the  system.  If 
Incorporation  of  the  moderated  predictions  Into  the  system  raises  neither 
the  degree  of  fairness  among  groups  nor  the  total  efficiency,  then  either  we 
need-to  take  --a- different  approach- to  Investigate  the -problem  of- fairness  or 
we  may  have  to  accept  the  system  as  being  appropriate  (fair  and  efficient) 
within  the  limitation  of  currently  available  technology. 

Step  8:  Make  final  Inputs  to  the  CPAS. 

The  Implementation  of  the  CPAS  being  developed  In  Project  3  requires  data 
from  Project  A  concerning  the  prediction  of  performance  In  various  jobs  for 
each  enlistee  and  the  utility  measures  (effective  coefficient)  of  the 
various  performances.  The  exact  form  of  the  data  to  be  Input  Into  the 
allocation  system  has  to  be  specified  by  Project  8.  The  staff  of  Task  1 
will  coordinate  with  the  staff  of  Project  8  to  determine  what  kinds  of  data 
are  most  directly  useful  to  the  system  and  in  what  format  they  should  be 
provided.  Then  we  will  generate  the  required  data  according  to  the 
specifications. 

The  prediction  of  performance  Is  to  be  based  on  the  predictor-criterion 
relationships  as  estimated  In  the  validation  of  the  classification  battery. 
The  utility  measures  are  to  be  based  on  the  analysis  of  the  value  and 
preference  Judgments  collected  from  the  Army  staff.  Thus  the  required  data 


1-63 


i  j: t  vi  n 


can  be  generated  as  by-products  of  tne  validations  In  Task  1  and  the 
analysis  of  utility  data  In  Tasks  1  and  4. 

At  the  present  time,  we  believe  that  the  most  convenient  form  of  Input  Is  to 
provide  the  classification  measures  for  each  enlistee  to  be  assigned,  a 
probaDlllstlc  prediction  model,  for  each  MQS,  and  a  utility  function  of  the 
performance  measures  for  each  type  of  job  (or  In  the  case  where  a  continuous 
utility  function  Is  not  determined,  a  vector  containing  utility  for  each 
specified  level  of  performance  In  that  job).  In  this  way,  the  computations 
of  the  expected  utilities  for  each  job-person  match  can  be  performed  within 
the  allocation  system.  The  advantage  of  this  form  of  inputs  is  that  If  any 
of  the  three  elements  that  constitute  the  bases  of  utility  computations 
changes  (as  may  often  be  the  case  when  social  or  political  conditions 
change),  the  system  can  be  operated  without  modification  except  for  update 
of  that  particular  data  element(s). 

An  alternative  would  be  to  calculate  the  expected  utilities  for  each  of  the 
possible  job-person  matches  for  direct  Inputs  Into  the  system.  This  may 
require  less  computational  effort  In  the  CPAS  but  will  sacrifice  the 
flexibility  afforded  in  the  first  form. 

In  addition,  depending  on  the  actual  design  of  the  system,  other  data  may  be 
required  for  Its  operation.  Based  on  our  understanding  cf  the  proposed 
approach  to  be  taken  In  Project  3,  the  planning  module  of  the  system  intends 
to  treat  Individuals  as  members  of  classes  :nd  we  are  to  develop  the  class 


definitions  for  that  use.  If  this  Is  Indeed  the  chosen  strategy,  we  will 
define  homogeneous  classes  of  Individuals  on  the  basis  of  the  classification 
measures  so  as  to  minimize  the  loss  of  Information  on  the  predictor-criterion 
relationships  as  a  result  of  such  grouping.  The  definition  of  these  classes 
will  depend  In  part  on  how  sensitive  the  utility  measure  is  In  differen¬ 
tiating  the  various  levels  of  performances  that  can  be  predicted  from  the 
classification  measures.  Clustering  methods  will  be  employed  to  divide  the 
profiles  of  expected  utilities  of  a  representative  sampleof  enlistees  into- 
a  reasonaole  number  of  classes.  Each  class  is  then  Identified  by  the  range 
of  the  predictor  values  for  the  group  of  enlistees  In  it.  Then  analysis  of 
the  predicted  performances  (In  the  utility  scale)  based  on  the  class  defi¬ 
nitions  and  those  based  on  actual  classification  measures  will  be  conducted 
to  assess  the  loss  of  information  from  using  the  class  definitions. 

A  third  category  of  data  that  may  be  requested  by  the  CPAS  Is  estimation  of 
probabilities  that  a  particular  enlistee  will  survive  the  Initial  training 
and  an  estimation  of  the  time  and  cost  Involved  In  the  training  before  a 
recruit  1$  assigned  to  the  job  unit.  Similarly,  estimates  of  the 
probability  that  an  enlistee  will  fall  to  complete  the  first  tour  can  be 
computed.  Reenlistment  rates  and  second  tour  performance  levels  and 
associated  utilities  can  likewise  be  estimated.  These  findings,  based  on 
the  FY83/84  cohort  data,  may  be  Incorporated  Into  the  optimization  objective 
to  improve  the  utility  of  the  allocation  outcomes  because  they  represent  yet 
another  kind  of  value  and  cost  to  the  Army.  The  purpose  of  examining  the 
potential  use  of  these  data  in  making  assignment  Is  to  evaluate  whether 


utilization  of  such  Information  will  substantially  Increase  the  utility  of 
the  allocation  outcomes. 


The  estimation  of  these  probabilities  and  costs  will  be  based  primarily  on 
the  existing  data  base  and  may  require  additional  effort  to  merge  historical 
data  Into  the  longitudinal  data  base.  At  later  time,  data  on  attrition, 
training  and  reenlistment  will  be  accumulated  for  tl  FY86/87  cohort  and 
added  to  the  earlier  historical  data  to  update  the  estimates  of  such  proba¬ 
bilities  and  training  costs.  The  new  estimates  can  be  used  In  the  field 
inclement  at  Ion  of  the  CPAS  If  Project  8  decides  that  use  of  such  information 
Improves  the  utility  of  Its  outcomes. 


Step  9:  Write  Interim  reports. 

Interim  reports  will  be  prepared  as  various  stages  of  the  analyses  are 
completed.  Reports  are  anticipated  summarizing  our  findings  with  reagrd  to 
the  following  Issues: 


(1)  The  costs  and  benefits  of  adding  new  predictors  to  the 
existing  pre-induction  battery. 

(2)  The  costs  and  benefits  of  using  Information  obtained 
earlier  In  soldier’s  career  (selection/classification 
test  scores,  training  success,  and  Army-wide  measures  to 
make  later  administrative  decisions  (eligibility  for 
reenlistment,  second-tour  assignment). 

(3)  The  stability  and  general Izabll ity  of  our  validation 
findings. 


(4)  Methods  of  continued  updating  of  the  estimation 
procedures  for  the  CPAS,  using,  for  example,  ’’proxy" 
criteria. 

(5)  The  effects  of  using  moderator  variables  In  the 
allocation  algorithm. 


Step  10:  Hrlte  the  final  report. 

The  final  report  will  conclude  the  work  of  the  project,  but  It  will  pertain 
not  only  to  the  validation  of  pre-induction  predictors  but  also  to  the 
validation  of  training  measures  as  predictors  and  to  the  preparation  of 
Inputs  to  the  CPAS. 


SUMMARY  OF  EXPECTED  OUTCOMES 


The  overall  objective  of  Task  1  Is  to  validate  thu  new  selection  and 
classification  Instruments  that  are  being  developed  by  Project  A  for  the 
Army.  The  longitudinal  validation,  to  be  carried  out  with  the  FY86/87  cohort 
data  Is  aimed  at  producing  concrete  evidence  to  justify  the  operational  use 
of  the  new  battery  In  the  CPAS  being  designed  by  Project  8.  Along  with  this 
validation  effort.  Task  1  will  develop  prediction  models  of  a  recruit's 
future  job  performance  in  his/her  Army  career,  and  empirically  determine  the 
expected  utilities  to  the  Army  If  he/she  Is  assigned  to  each  of  the  parti¬ 
cular  riOS.  The  predictions  and  the  utility  measures  are  the  foundations  of 
the  personnel  allocation  system.  The  longitudinal  validation  will  also 
examine  the  Incremental  benefit  and  the  feasibility  of  sequential  decisions 
In  the  allocation  system  —  Initial  selection  at  the  time  of  induction 
followed  by  reassignment  decisions  at  various  choice  points  such  as  post¬ 
training  and  reenlistment,  as  Information  on  the  recruit  Is  accumulated  from 
the  early  period  of  his/her  Army  career. 

Task  1  will  also  carry  out  a  sensitivity  analysis  of  the  performance  of  the 
allocation  system  by  Implementing  the  CPAS  under  varying  conditions  (e.g., 
altering  system  constraints  to  reflect  differing  priorities  In  the  Army's 
mission,  changing  the  utility  structures  to  represent  changing  social, 
economical,  and  political  situations,  and  adjusting  system  parameters  to 


accommodate  new  or  revised  policy  decisions).  This  analysis  will  also  be 
directed  at  Identifying  factors  that  can  affect  the  effectiveness  of  the  new 
allocation  procedures.  3y  examining  the  factors  that  are  most  critical  In 
determining  the  efficiency  of  the  system.  Task  1  will  make  recommendations 
on  the  frequency  of  required  updates,  the  components  to  be  updated,  and  the 
updating  procedures  so  that  the  effectiveness  of  the  CPAS  can  be  assured  at 
all  times. 

It  should  also  be  added  that,  In  order  to  maintain  a  continuity  of  the 
validation  efforts  Initiated  during  this  project.  Task  1  also  expects  to 
provide  a  basis  for  on-going  validations  of  the  classification  procedures, 
for  this  purpose,  the  extent  to  which  the  three  types  of  performance 
measures  (training  success.  Army-wide,  and  MOS-specif ic)  are  equivalent  to 
one  another  will  be  carefully  assessed.  On  the  basis  of  this  assessment. 
Task  1  will  determine  the  feasibility  of  continued  validations  using  earlier 
or  more  easily  obtained  performance  measures  as  substitutes  for  later, 
and/or  more  expensive,  measures  of  performance.  If  judged  feasible, 
recommendations  will  be  made  regarding  practical  procedures  for  conducting 
satisfactory  validations,  employing  more  readily  obtainable  measures  and 
adhering  to  resource  constraints. 

In  the  course  of  achieving  this  ultimate  objective.  Task  1  will  perform 
interim  validations  aimed  at  providing  information  to  aid  in  tne  development 
of  new  predictors  that  will  have  high  predictive  validity  and  can  be  effi¬ 
ciently  employed  in  the  CPAS.  In  addition.  Task  1  will  provide  technological 


and  analytical  support  for  the  development  of  reliable  and  valid  performance 
measures.  These  performance  measures  are  required  for  the  validation  of  the 
classification  instruments,  out  may  also  prove  useful  for  operational  evalu¬ 
ations  of  enlisted  personnel  In  the  future. 

A  more  Immediate  outcome  of  Task  1  will  be  the  ASVAB  composite  scores  to  be 
recommended  for  use  In  the  current  selection  and  classification  procedure. 
One  of  the  first  endeavors  of  Task  1  will  be  to  validate  the  existing 
predictors  (primarl  ly  ASVA8,  high  school  education,  and  biographical  data). 
This  Initial  validation  effort  will,  at  the  same  time,  produce  recommen¬ 
dations  on  the  best  set(s)  of  composite  scores  by  evaluating  the  differential 
validity  of  alternative  scoring  formulas  and  considering  the  cost-benefit 
trade-off  of  adopting  new  scoring  procedures  In  the  current  selection 
practices.  The  Impact  of  various  cutoffs  for  selection  of  the  recruits  Into 
an  rtQS  or  MOS  family  will  also  be  determined. 

In  order  to  accomplish  these  practical  objectives.  Task  1  will  have  to 
Investigate  the  conceptual  and  methodological  Issues  surrounding  personnel 
research  (specifically,  with  regard  to  validation  and  development  of 
measuring  instruments).  These  investigations  are  expected  to  produce 
practical  solutions  to  technical  problems  such  as  how  to  adjust  for 
restriction  of  range,  and  how  to  measure  differential  validity  and 
classification  efficiency.  Such  solutions  will  be  developed  both  by 
adapting  existing  techniques  and  by  devising  new  methods.  These  research 
efforts  will  not  only  result  in  technical  advances  that  are  of  use  in  the 


present  project  as  well  as  In  other  personnel  research,  but  they  will  also 
generate  scientific  Information  that  will  Increase  our  understanding  of  the 
personnel  selection  proolems. 


The  operational  and  scientific  outcomes  that  can  be  expected  from  Task  1  are 
summarized  below. 


Operational  Outcomes 


(1)  A  recommended  set  of  ASVA8  composites  and  cutoffs  for  use  In  the 
Army's  current  selection  and  classification  procedures.  Use  of 
these  composites  Is  expected  to  Increase  the  effectiveness  of  the 
Initial  classification  decisions. 

(2)  Feedback  Information  to  other  tasks  that  will  support  the 

development  of  valid  and  reliable  new  predictor  and  performance 
measures  for  the  Army.  In  addition  to  serving  as  criteria  In  the 
validations,  the  performance  measures  may  also  be  used  In  future 
operational  evaluations  of  enlisted  personnel.  Such  information 
will  be  extracted  from  the  validation  analysis  as  well  as  from  the 
statistical  and  psychometric  analyses  of  the  field  test  data. 

(3)  Practical  procedures  for  continued  validations  of  the  classi¬ 

fication  Instruments.  These  procedures  will  be  developed  In 
conjunction  with  the  Investigation  of  criterion  equivalence  among 
performance  measures. 

(4)  A  cost-effective  classification  battery  that  can  be  administered 
easily  and  efficiently  at  the  MEPs  and  be  readily  used  by  the  CPAS 
for  making  tralnlng/MOS  assignments.  This  will  be  achieved  by  the 
cost-benefit  analyses  of  competing  batteries  while  they  are  being 
developed. 

(5)  A  sequential  decision  framework  that  will  enable  the  Army  to  make 
reassignments  at  various  choice  points  (e.g.,  post-training, 
promotions,  and  reenlistment).  Through  such  a  sequential  decision 
process,  the  Army  can  increase  Its  efficiency  in  personnel 
utilization. 


(6)  A  common  utility  scale  across  the  MOS  and  the  performance  levels 
within  MOS,  and  based  on  this  scale,  a  composite  of  a  recruit's 
expected  total  utility  to  the  Army  (training  performance,  general 
conduct,  as  well  as  task-specific  performance) .  This  utility 
composite  will  be  used  as  the  effectiveness  coefficient  required 
to  drive  the  allocation  algorithm  In  the  CPAS. 

(7)  Accurate  prediction  models  for  estimating  the  recruit's  future 
performance  utility  from  the  classification  measures.  These 
predictions  will  serve  as  a  fundamental  basis  for  an  efficient 
allocation  system  that  alms  to  Increase  the  effectiveness  of 
performance  In  the  Army. 

(8)  Recommendations  on  procedures  for  updating  the  CPAS  so  as  to 
ensure  Its  continued  ability  to  achieve  optimal  classification 
outcomes  and  thus  the  best  utilization  of  the  recruits  by  the 
Army. 


Scientific  Outcomes 


(1)  Improved  or  new  technical  solutions  to  the  methodological  problems 
that  plague  research  on  personnel  decisions.  Examples  of  such 
problems  are  restriction  of  range,  measures  of  classification 
efficiency  and  differential  validity,  validity  generalizations, 
cross  validations,  nonlinear  prediction  models,  moderating 
effects,  and  stability  of  estimations. 

(2)  New  or  Improved  applications  of  psychometric  methods  In  the 
development  of  measuring  Instrument?. 

(3)  Advances  In  the  technology  of  using  computerized  adaptive  testing 
In  Information  gathering  for  the  purpose  of  decision  making. 


1-7 


REFERENCES 


Anastasl,  A.  Psychological  testing  (3rd  ed.).  Mew  York:  Macmillan,  1968. 

drogden,  H.  E,  On  the  Interpretation  of  the  correlatlSn  coefficient  as  a 
-  measure  of  predlctlve-efflclenty.  -  Journal  of  Educational- Psychology. 
1946,  37,  65-76.  - -  - 

Cleary,  T.  A.  Test  bias:  Prediction  of  grades  of  Negro  and  white  students 
In  Integrated  colleges.  Journal  of  Educational  Measurement.  1968.  5. 
115-124. -  -  - “ 


Cochran,  W.  G.  Some  effects  of  errors  of  measurement  in  linear  regression. 
Proceedings  of  the  Sixth  Berkeley  Synposlum  on  Mathematical  Statistics 
and  Probability,  197U.  SZ/-S39. 

Cronbach,  L.  J.  Equity  In  selection— where  psychometrics  and  political 
philosophy  meet.  Journal  of  Educational  Measurement.  1976,  V3»  31*41. 

Frederlksen,  N.,  &  Melville,  S.  0.  Differential  predictability  In  ti.e  use 
of  test  scores.  Educational  and  Psychological  Measurement.  1954.  14. 
647-656.  - -  “ 

G1 liman,  L. ,  &  Goode,  H.  H,  An  estimate  of  the  correlation  coefficient  of 
a  bivariate  normal  population  when  X  Is  truncated  and  Y  Is  dichotomized. 
Harvard  Educational  Review.  1946,  JI6,  52-55. 

Ghlselll,  E.  E.  Differentiation  of  Individuals  in  terms  of  their  predict¬ 
ability.  Journal  of  Applied  Psychology.  1956,  40,  373-377. 

Ghlselll,  E.  E.  The  prediction  of  predictability.  Educational  and  Psycho¬ 
logical  Measurement.  1960,  20,  3-8. 

Ghlselll,  E.  E.  Moderating  effects  and  differential  reliability  and 
validity.  Journal  of  Applied  Psychology.  1963,  47,  81-86. 

Ghlselll,  E.  £.  Comment  on  the  use  of  moderator  variables.  Journal  of 
Applied  Psychology.  1972,  ^56,  270. 

Greener,  J.  M.,  &  Osburn,  H.  G.  Accuracy  of  corrections  for  restriction 

In  range  due  to  explicit  selection  In  heteroscedastlc  and  nonlinear 
distributions.  Educational  and  Psychological  Measurement.  1980.  40, 
337-347.  - -  — 


Gulon,  R.  M.  Employment  tests  and  discriminatory  hiring.  Industrial 
delations.  1966,  £,  20-37. 

Heckman,  J.  J.  Sample  selection  bias  as  specification  effort.  Econometrics. 
1979,  7,  153-161. 

Joreskog,  K.  0.  Simultaneous  factor  analysis  In  several  populations. 
Psychometrika.  1971,  36,  409-426. 

Joreskog,  K.S. ,4  Sorbonu'O.  Slmul taheous  anal ysi s  of  longitudinal  data 
from  several  cohorts.  desearch  deport  30-5.  Uppsala,  Sweden: 
university  of  Uppsala,  1980. 

Joreskog,  <.  G. ,  &  Sorbom,  0.  LISREL .  V:  Analysis  of  linear  structural 

relationships  by  the  metKotT of  maximum  likelihood,  desearch  deport 
3 1-8.  Uppsala,  Sweden:  University  of  llppsala,  1981. 

Lawley,  0.  A  note  on  Karl  Pearson's  selection  formulas.  Royal  Society 
of  Edinburgh  Proceedings.  Section  A.  1943,  62,  28-30. 

uawley,  0.  N. ,  4  Maxwell,  A.  E.  Factor  analysis  as  a  statistical  method 
(2nd  ed.).  London:  Butterworths,  19/1. 

Llndley,  0.  N.  degression  lines  of  the  Linear  functional  Relationship, 
Journal  of  the  Royal  Statistical  Society.  Series  8.  1947,  9 ,  218-244. 

Linn.  d.  L.  Fair  test  use  In  selection.  Review  of  Educational  Research. 
1973,  43,  139-161. 

Maler,  M.  H.  Validation  of  selection  aid  classification  tests  in  the  Army. 
Working  paper  personnel  utilization  Technical  area  82-k.  Alexandria, 
VA:  U.S.  Army  Research  Institute,  November  1981. 

Maler,  M.  H.  Issues  for  defining  ASVAB  1 1/12/13/14  aptitude  composites. 

A 1 exandrla,  VA:  dent er  for  Nava)  Analysis,  1982. 

McCarthy,  P.  J.  The  use  of  balanced  half-sample  replication  In  cross- 

validation  studies.  Journal  of  the  American  Statistical  Association, 
1976,  7J[,  596-604. 

McGaw,  3.,  4  Jcreskog,  K.  G.  Factorial  Invariance  of  ability  measures  In 

groups  differing  In  Intelligence  and  socio-economic  status.  British 
Journal  of  Mathematical  and  Statistical  Psychology.  1971,  24,  154-168. 

Meredith,  W.  M.  The  estimation  of  criterion  parameters  from  a  biased  sample 
(Ooctoral  dissertation.  University  of  Washington).  Dissertation 
Abstracts.  1958,  19,  981. 


& 


U-* 


K 


V 

-a. 


1-74 


lx.: 


t. 


ti 


Meredith,  W.  M.  Notes  on  factorial  Invariance.  Psvchometrlka.  1964,  29. 
177-185. 

Miner,  M.  G.,  4  Miner,  J.  B.  Analysis  of  Unlfom  Guidelines  on  Employee 

Selection  Procedures.  Washington,  5.C.:  Bureau  of  National  Affairs 
Educational  System,  1979. 

Mulalk.  S.  A.  The  foundations  of  factor  analysis.  New  York:  McGraw-Hill, 
1972.  - 

Novlck,  M.  R.,  4  Thayer,  0.  T,  An  Investigation  of  the  accuracy  ofthe 

Pearson  selection  formulas.  Research  Memorandum  (RM-69-22).  Princeton, 
TlJi  Educational  Testing  Service,  1969. 

Olkln,  1.,  4  Pratt,  J.  W.  Unbiased  estimation  of  certain  correlation 
coefficients.  Annals  of  Mathematical  Statistics,  1958,  29,  201-211. 

Rydberg,  S.  Bias  In  prediction.  Stockholm:  Alqvist  and  Wlksell,  1963. 

Saunders,  D.  R.  The  "moderator  variable"  as  a  useful  tool  In  prediction. 

Proceedings  of  the  1954  Invitational  conference  on  testing  problems. 
Princeton,  N.J.:  educational  Testing  Service,  1955. 

Saunders,  0.  R.  Moderator  variables  In  prediction.  Educational  and  Psycho¬ 
logical  Measurement,  1956,  J<>,  209-222. 

Sharrna,  $.,  Ourand,  R.  M.,  4  Gur-Arle,  0.  Identification  and  analysis  of 
moderator  variables.  Journal  of  Marketing  Research.  1981,  3,  291-300. 


Sims,  W.  H.  An  application  of  factor  analysis  to  the  construtlon  of 
Improved  classification  composites  from  the  Armed  Services  V 
Hpcituae  oatcery  (AovHb)  rorms  o  and  7.  Arlington,  va:  Cl 


Apcituatf 
Naval  Analyses,  19^8. 


Vocational 

Center  r  5r 


Sims,  W.  H.,  4  Hiatt,  C.  M.  Validation  of  the  Armed  Services  Vocational 

Aptitude  Battery  J^SVAar  forms  6  and  7  wUh  applications  to  ASVAiT" Forms 


ann  lu.  ~uib  i  ibO.  tfesearen  report,  ffiexanoria.  va:  center  ror 
Nava!  Analyses,  1981. 

Sims,  W.  H. ,  4  Mifflin,  T.  1.  A  factor  analysis  of  the  Armed  Services 

Vocational  Aptitude  (ASVAB)  Form*  6  and  7  ( Memorandum) .  A rl  1  ngton, 

W:  center  for  Naval  Analyses,  19/8. 

Snee,  R.  0.  Validation  of  regression  models:  Methods  and  techniques. 
Technometrics.  1977,  J_9,  415-428. 

Sorbom,  0.  A  general  method  for  studying  differences  In  factor  means  and 

factor  structures  between  groups.  British  Journal  of  Mathematical  and 

— tt7t; - jr, - 


j 


1 


i 


% 


Statistical 


Psychology. 


L 


1-75 


•  1 
.  1 


Sorbom,  0.  An  alternative  to  the  methodology  for  analysis  of  covariance. 
•Psvchometrlka.  1978,  43,  381-396. 

Sorbom,  0.  Structural  equation  models  with  structured  means.  In  <.  G. 

Joreskog  4  H.  Hold  (Ids.),  Systems  under  Indirect  observation: 
Causality,  structure,  and  prediction.  Amsterdam:  North-Hoi  land.  1981. 

Srlnlvasan,  4.,  4  Weinstein,  A.  S.  Effects  of  curtailment  on  an  acfanlsslons 
model  for  a  graduate  management  Drooram.  Journal  of  Add lied  Psychology. 
1973,  58,  339-346.  - 


Wherry,  R.  4.  A  new  formula  for  predicting  the  shrinkage  of  the  coefficient 
of  multlole  correlation.  Annals  of  Mathematical  Statistics.  1931.  2. 
446-457.  -  ~ 

Wise,  L.  1. ,  &  Wang,  M.  Project  A:  (Development  of  Improved  Army  Selection 
and  Classification  Systems  Longitudinal  Research  Database  Plan  LOrafti. 
tJuman  Resources  Research  Organization,  1993. 

Zedeck,  S.  Problems  with  the  use  of  "moderator*  variables.  Psychological 
Bulletin,  1971,  76,  295-310.  - - 

Zedeck,  S.,  Cranny,  C.  4.,  Vale,  C.  A.,  4  Smith,  P.  C.  Convarlson  of  "joint 
moderators"  In  three  prediction  techniques.  Journal  of  Aoolled 
Psychology.  1971,  55,  234-240. 


1-76 


TASK  2  RESEARCH  PLAN 


PRE-INDUCTION  PREDICTION  OF  ARMY  SUCCESS 

GENERAL  PURPOSE  OF  TASK  2 

The  general  purpose -of  Task  2  Is  to-ldentlfy  an- efficient  and  effective  set 
of  Initial  or  pre-induction  predictors  of  soldier  performance.  By  effi¬ 
cient,  we  mean  that  time  and  money  to  be  expended  on  operational  adminis¬ 
tration  of  the  predictors  Is  kept  as  low  as  possible,  and  by  effective,  we 
mean  that  the  predictors  forecast  as  accurately  as  possible  the  degree  of 
success  to  be  expected  of  recruits  In  various  aspects  of  soldier  per¬ 
formance,  e.g.,  overall  adaptation  to  the  Army,  completion  of  training, 
performance  In  specific  MOS,  and  reenllstment. 

There  are  two  different,  but  related  aspects  to  this  general  purpose. 
First,  we  will  evaluate  the  effectiveness  of  the  present  set  of  Initial 
predictors  used  by  the  Army.  The  major  Initial  predictor  now  In  use  Is  a 
set  of  cognitive  tests,  the  Armed  Services  Vocational  Aptitude  Battery 
(ASVAB).  Prior  research  shows  that  this  battery  and  Its  similar  prede¬ 
cessors  are  fairly  effective  In  predicting  how  well  soldiers  will  perform 
during  training,  but  there  Is  much  less  Information  available  about  Its 
effectiveness  In  predicting  other  Important  areas  of  soldier  performance, 
notably  on-the-job  performance.  Task  2  In  conjunction  with  Task  1  will 
perform  research  to  evaluate  the  effectiveness  or  validity  of  the  ASVAB  for 
predicting  these  additional  aspects  of  performance. 


2-1 


Second,  the  ASVAB  contains  only  cognitive  tests.  Measures  of  other  types 
of  human  abilities  and  characteristics  have  been  shown  to  be  useful  for 
predicting  effective  performance  In  training  and  on  the  job  in  a  number  of 
occupational  areas  (Dunnette,  1976;  Owens,  1976).  Task  2  will  Identify  and 
develop  new  predictors  that  are  most  likely  to  be  effective  and  efficient 
additions  to  the  present  set  of  predictors.  The  validity  or  effectiveness 
of  these  new  predictors  will  be  Investigated  In  the  same  way  as  the  vali¬ 
dity  of  the  present  set  of  predictors.  The  evaluation  of  the  efficiency  of 
newly  developed  predictors  will  require  analysis  of  the  improvement  In  pre¬ 
diction  of  soldier  performance  gained  by  use  of  the  new  predictors  over 
that  obtained  by  the  sole  use  of  the  present  set  of  Initial  predictors. 

Thus,  the  two  aspects  of  Task  2  are  closely  Intertwined;  new  predictors 
will  be  developed  to  add  to  and  complement  the  present  set  of  predictors; 
the  validity  of  all  predictors  for  all  aspects  of  soldier  performance  will 
be  scrutinized,  and  the  complete  set  of  present  and  new  predictors  will  be 
analyzed  to  Identify  the  most  efficient  set  of  Initial  predictors. 
Intertwined  as  well  are  the  school -based  criteria  and  predictors  developed 
in  Task  3,  the  Army-wide  measures  and  utilities  developed  In  Task  4,  the 
specif 1c- job  measures  developed  In  Task  5,  and  the  Integrated  analyses  of 
Task  1.  The  "one  project"  nature  of  Project  A  Is  perhaps  best  Illustrated 
In  the  evaluation  of  existing  and  new  Initial  predictors. 


BACKGROUND  ISSUES  AND  RATIONALE 

Perfoneence  Relatedness:  Construct  Validation 

Accuracy  of  personnel  selection  and  classification  decisions  rests  ulti¬ 
mately  on  the  degree  of  congruence  shown  between  performance  scores  on 
selection  measures  and  performance  measures  collected  some  time  later  with-. 
In  the  settings  or  jobs  where  persons  have  been  placed.  As  Wemlmont  and 
Campbell  (1968)  have  suggested  In  their  widely  cited  article,  congruence  Is 
most  likely  when  elements  of  the  predictor  measures  actually  sample  aspects 
of  the  performance  domain  to  be  measured,  as  opposed  to  resting  on  untested 
assumptions  about  relatedness  between  predictor  performance  and  later  per¬ 
formance. 

The  argument  leads  to  a  research  strategy  designed  to  identify  patterns  of 
behavioral  consistency  linking  meaningful  samples  of  performance  drawn  from 
different  contexts,  e.g.,  academic  performance,  extracurricular  activities, 
job,  and  work  performance,  leisure  pursuits,  behavioral  responses  to  stand¬ 
ardized  simulations,  etc.  Linkages  between  different  areas  may  be  made  by 
logical  analysis  of  activity  components  into  directly  measurable  tasks  and 
knowledge  (content-oriented  strategy),  by  rational  analysis  (by  job  and/or 
subject  matter  experts)  of  activity  components  according  to  relatively 
consistent  behavioral  abstractions  (construct-oriented  strategy),  or  by 
demonstrating  statistical  patterns  of  similarity  across  activity  components 
(criterion-oriented  strategy). 


2-3 


The  behavioral  consistency  point  of  view  has  the  following  conceptual  and 
methodological  Implications  for  the  process  of  evaluating,  Identifying,  and 
developing  new  pre-induction  predictors  of  Army  performance  effectiveness: 

1.  Choice  of  domains  within  which  to  develop  predictors  Is  focused  on 

evaluations  of  performance  and  performance  relatedness.  In  the 
present  context,  the  areas  of  performance  to  be  Investigated  are 
general  adjustment  to  a  career  In  the  Army,  success  In  Army  school 
training,  effectiveness  In  specific  Army  tasks  and  jobs,  and  choosing 
to  continue  an  Army  career  through  reenl Istment. 

2.  Validation  of  predictor  measures  begins  with  a  content-oriented 

strategy  that  may  be  elaborated  with  existing  behavioral  theory 

(construct-oriented  strategy)  and  confirmed  ultimately  with  empiri¬ 
cally  demonstrated  statistical  relationships  (criterion-oriented 
strategy). 

3.  An  Increasing  number  of  scholars  (e.g.,  Dunnette  &  Borman,  1979; 

Gulon,  1980;  Messlck,  1980;  Cronbach,  1980;  Peterson  &  Bownas,  1982) 
is  placing  primary  emphasis  on  the  construct-oriented  phase  of  the 
above  strategy.  Arguing,  In  effect,  that  all  validation  is  really 
construct  validation,  they  point  out  that  validation  encompasses:  (a) 
a  theoretical  or  conceptual  component,  the  more  highly  developed  the 
better;  (b)  a  developmental  step  involving  Instrumentation  based  on 
content  validation  In  which  judgments  are  made  about  the  appropriate¬ 
ness  of  content  for  specific  objectives;  and  (c)  an  empirical  compo¬ 
nent  that  entails  determining  empirical  relationships  between  a 


measure  and  other  measures.  Therefore,  any  discussion  of  the  validity 
of  a  predictor  for  a  specific  purpose  must  summarize  the  best  avail¬ 
able  Information  for  all  these  components. 

In  sum,  then,  the  approach  to  evaluating  existing  pre-induction  predictors 
and  in  developing  and  evaluating  new  ones  rests  on  behavioral  consistency 
and  construct  validation  as  central  concepts.  Dimensions  of  soldier  per¬ 
formance  will  be  Identified  which  are  Internally  consistent  and  relatively 
distinct  from  each  other.  Corresponding  classes  of  predictor  measures  will 
be  structured  on  the  basis  of  either  direct  behavior  sampling  or  strong 
conceptual  or  theory 'based  inferences,  and  linkages  between  the  performance 
dimensions  and  predictors  will  be  evaluated  empirically. 

Validity  Generalization  and  Situational  Moderation 

Hunter,  Schmidt,  and  Jackson  (1981)  have  presented  methods  they  call  State- 
of-the-Art  Meta  Analysis  for  use  In  helping  to  decide  whether  or  not  valid¬ 
ity  results  may  be  generalized  across  different  situations  and  popula¬ 
tions.  They  present  and  Illustrate  methods  for  correcting  observed  sample 
validities  for  such  artl factual  components  as  restriction  in  range,  cri¬ 
terion  and  predictor  unreliabilities,  and  variability  due  to  sampling 
error.  Several  recent  Investigations  (Dunnette,  et  al.,  1981;  Pearlman, 
Schmidt,  4  Hunter,  1980;  Schmidt,  Hunter,  4  Caplan,  1981;  Callender  4 
Osburn,  1981;  Schmidt,  Hunter,  4  Perlman,  1981)  of  large  data  sets  have 
shown  that  validities  are  more  likely  to  be  generalized  across  settings 
than  traditional  thinking  In  selection  research  has  assumed  them  to  be. 


Nonetheless,  some  amount  of  variance  In  validities  frequently  remains  even 
after  the  so-called  artl factual  components  leading  to  variability  have  been 
taken  Into  account.  For  example,  Linn,  Harnlsch,  A  Dunbar  (1981)  report 
that  about  30  percent  of  the  variance  across  726  validities  between  the  Law 
School  Admissions  Test  and  first-year  grades  In  law  school  remains  after 
estimating  and  removing  variance  due  to  artl factual  components.  Dunnette, 
et  al.  (1981)  estimates  that  35  percent  of  validity  variance  across  70 
electric  generating  companies  may  be  moderated  by  situation  or  company 
specific  factors. 

Considering  the  accumulating  evidence  cited  above.  It  seems  reasonable  tc 
expect  that  validities  are,  In  fact,  much  more  general  across  different 
settings  than  has  previously  been  presumed.  We  may  reasonably  estimate 
that  the  amounts  of  variance  In  valldltes  due  to  situational  components 
probably  range  between  0  percent  and  a  maximum  of  perhaps  *0  percent.  In 
the  context  of  predicting  performance  by  enlisted  personnel  In  the  Army,  It 
Is  our  expectation  that  both  existing  and  new  pre-induction  predictors  will 
show  validities  that  are  general  across  quite  broad  groupings  of  Army 
settings.  Army  performance  measures.  Army  population  subgroups,  and  Army 
tasks  and  jobs.  Our  methodological  strategy  will,  however,  test  empiri¬ 
cally  the  limits  of  such  validity  generalizations  and  capitalize  upon  pat¬ 
terns  of  situational  moderation  where  they  are  found  to  exist.  In  effect, 
by  evaluating  validity  results  In  this  way  for  both  current  and  new  pre¬ 
induction  predictors,  we  will  be  able  to  discover  patterns  of  similarity 
and  differences  between  validities  according  to  such  potential  situational 
taxonomies  as  types  or  families  of  predictor  constructs,  dimensions  of 


2-6 


Army-wide  success,  clusters  or  families  of  Army  schools.  Army  job  (MOS) 
families,  and  sex  and  race  subgroups. 

Our  strategy  will  be  to  discover  predictors  with  validities  that  generalize 
across  sex  and  race  subgroups  and  across  all  dimensions  of  Amy  perform- 
ance.  However,  we  shall  also  seek  to  discover  predictors  that  possess 
different  validities  across  a  sample  of  Army  jobs  In  order  to  make  more 
likely  the  Increased  efficiencies  of  personnel  utilization  which  can  be 
realized  by  strategies  of  differential  placement  or  classification. 

Efficiencies  of  Classification 

Ideally,  the  Army  would  place  all  persons  In  jobs  best  suited  to  them,  an 
outcome  that  would  somehow  assure  not  only  that  each  person  could  use 
his/her  abilities  In  the  best  possible  way  but  that  the  Army  could  also 
allocate  Its  human  resources  optimally  across  all  available  job  assign¬ 
ments.  This  state  of  affairs  Is  never  perfectly  attainable,  but  the  goal 
of  personnel  classification  In  the  Army  should  be  to  seek  to  optimize  the 
matching  of  human  skills  with  job  requirements  within  the  constraints  dic¬ 
tated  by  the  particular  mix  of  human  resources  and  jobs  that  exist  at  any 
particular  time. 

In  fact,  the  numbers  of  jobs  available  In  the  Army  and  their  great  diver¬ 
sity  afford  opportunities  for  efficient  utilization  of  human  resources  not 
available  for  most  employers.  Many  of  these  potential  advantages  are 


described  by  Dunnette  (1966,  pages  184-185).  The  most  important  advantage 
was  recognized  and  described  many  years  ago  by  Brodgen  (1951).  He  showed 
that  a  personnel  classification  strategy  could  yield  a  marked  Increase  In 
overall  job  effectiveness  for  an  organization  by  making  possible  the  use  of 
more  advantageous  selection  ratios  for  each  of  the  distinct  job  areas  to  be 
filled. 

In  spite  of  the  obvious  advantages  to  be  gained  from  classification  as 
opposed  to  pure  selection,  classification  procedures  are  often  difficult  to 
Implement  because  of  the  many  constraints  operating  In  the  real  world  of 
personnel  decision  making.  In  the  Army,  such  constraints  might  Include 
stated  preferences  (for  job  areas,  types  of  schooling,  geographic  location, 
etc.)  by  recruits,  special  skill  or  educational  requirements,  the  Army's 
ability  to  estimate  human  resource  requirements  and  to  provide  appropriate 
training  or  job  assignments  over  the  long  term  as  opposed  to  needs  for 
filling  Immediate  short-term  vacancies,  and  the  nature  of  cyclical  fluctu¬ 
ations  In  recruit  availability  throughout  the  year.  Most  of  all,  such  a 
strategy  requires  Input  of  empirical  Information  about  the  way  In  which  the 
various  aspects  of  soldier  performance  Is,  In  fact,  predicted  by  different 
configurations  of  scores  on  pre-induction  predictors.  It  Is  crucial  that 
attention  be  given  to  the  development  of  such  predictors  from  the  beginning 
of  the  research  effort  and  that  all  estimates  of  predictor-criterion  link¬ 
ages  be  evaluated  against  a  backdrop  of  emphasizing  differential  predictor 
validities  for  differing  assignments  In  the  Army.  This  is  to  assure  that 
the  efficiencies  of  accurate  classification  procedures  may  be  realized  by 
the  Army  In  Its  recruitment  and  training  of  enlisted  personnel . 


2-8 


Incremental  Validity  and  utility 


As  noted  in  the  statement  of  general  purpose  of  Task  2,  a  major  part  of  the 
activity  will  be  devoted  to  estimating  the  amount  of  Improvement  in  predic¬ 
tion  obtained  by  combining  new  pre-induction  predictors  with  current  pre¬ 
dictors,  or  estimating  the  amount  of  incremental  validity. - 

Incremental  validity  has  traditionally  been  expressed  according  to  sta¬ 
tistically  significant  Increases  In  variance  accounted  for  (R2)  as  a  result 
of  adding  more  predictor  components  or  In  terms  of  statistically  signifi¬ 
cant  differences  between  product  moment  coefficients  or  hit  rates  derived 
from  competing  selection  strategies,  e.g.,  old  versus  new  predlct'rs; 
clinical  versus  statistical  prediction,  etc. 

Task  2  will  make  use  of  such  Incremental  validity  comparisons,  but  these 
comparisons  are  intended  as  way  stations  on  the  road  to  developing  more 
meaningful  comparisons  of  the  relative  Increase  In  utility  or  benefit 
attributable  to  the  addition  of  new  predictors.  These  Incremental  utility 
analyses  will  be  conducted  primarily  by  Task  1  and  Task  4  staff  on  Project 
A  and  will  consider  costs  associated  with  use  of  new  pre-induction  predic¬ 
tors  and  benefits  associated  with  different  levels  of  job  performance. 
Results  of  these  analyses  will,  however,  inform  decisions  made  in  Task  2 
about  the  relative  usefulness  of  the  newly  developed  pre-induction 
measures. 


2-9 


Technological  advances  In  computer  hardware,  particularly  the  development 
of  microcomputers,  and  the  application  of  computers  to  psychological 
measurement  have  several  Implications  for  Task  2.  First,  testing  can 
potentially  t>e  decentralized  with  much  less  risk  to  the  security  of  scoring 
keys  and  a  large  gain  -In  the  standardization  of  administration.  Second, 
measurement  of  human  abilities  or  other  characteristics  can  be  accomplished 
In  less  time  and  with  Improved  psychometric  quality.  Third,  types  of  human 
abilities  that  were  difficult  to  measure  because  of  unwieldy,  unreliable, 
or  extremely  costly  apparatus  are  now  much  more  feasible  to  measure.  This 
applies  especially  In  the  area  of  perceptual /cognitive  abilities.  Finally, 
the  advances  In  microcomputer  hardware  are  so  rapid  that  what  was  not 
feasible  yesterday  due  to  high  cost  Is  very  likely  to  be  feasible  today, 
and  almost  certainly  will  be  tomorrow.  In  light  of  all  this,  our  approach 
to  Task  2  will  be  to  make  as  full  use  as  possible  of  the  capabilities  of 
present  computer  technology,  and  to  keep  foremost  In  our  view  the  strong 
possibility  of  ultimate  computer  administration  of  all  newly  developed 
pre- Induction  measures. 


mslon  of  the  Measured  Predictor  Space 


The  present  pre-induction  predictors  used  for  Army  enlisted  personnel  (EP) 
selection  and  classification  Include  the  following:  scores  on  the  subtests 
of  the  ASVA8,  age,  sex,  high  school  diploma  status;  and  a  biographical 
questionnaire,  the  Military  Aptitude  Profile  (MAP).  ASVAB  8/9/10  contains 
10  short  subtests  of  cognitive  abilities  and  achievement.  The  subtests. 


t 


In  various  combinations  of  from  three  to  five  each,  are  used  to  form 
aptitude  composites  for  use  In  predicting  training  school  success  In 
various  specific  HOS  groups.  Maler  (1981)  reports  mean  validity 
coefficients  ranging  from  .26  to  .52. 


C 


Maler  (1981)  also  shows  that  ASVAB  validities  generalize,  for  the  most 
part,  across  racial  and  sex  subgroups.  Results  for  ASVAB  agree,  therefore, 
with  findings  reported  by  Schmidt  and  Hunter  and  their  colleagues  (Schmidt, 
Hunter  4  Pearlman,  1981;  Schmidt  4  Hunter,  1978;  Schmidt,  Hunter,  Pearlman 
4  Shane,  1979)  who  have  presented  convincing  evidence  that  cognitive  tests 
show  empirical  validities  that  appear  to  generalize  quite  strongly  across  a 
number  of  different  training  and  job  performance  areas  In  the  Army.  How¬ 
ever,  It  Is  Important  to  note  that  their  investigations,  similar  to  the 
ones  summarized  by  Maler,  have  so  far  been  restricted  to  cognitive  measures 
and  that  the  degree  of  validity  generalization  Is  greater  across  training 
or  school  settings  than  across  specific  MOS  task  performance  settings. 


This  set  of  predictors,  dominated  by  the  ASVAB,  Is  somewhat  limited  when 
considered  against  the  total  range  of  tests,  questionnaires,  and  Inven¬ 
tories  that  have  been  developed  by  psychologists  over  the  last  50  years. 
Current  pre-induction  predictors  Include  no  psychomotor,  perceptual,  or 
physical  measures.  Nor  do  they  Include  measures  of  personality  or  voca¬ 
tional  interest.  Use  of  Instruments  from  these  domains  would  certainly 
Increase  the  measured  portion  of  the  total  possible  predictor  space,  and 
thereby  enhance  the  possibilities  of  obtaining  Improvements  In  the  accuracy 
of  selection  and  classification.  The  net  effect  of  these  considerations  Is 


n 


isl£t 


‘1  ■»  £9'. 


to  Impress  upon  us  the  Importance  of  developing  and  evaluating  a  richer 
array  of  pre-induction  predictors  than  have  been  used  thus  far  In  the 
assessment  of  Anqy  EP  candidates. 

Rationale 

The  rationale  for  the  conduct  of  Task  2  Is  founded  on  the  consideration  of 
the  Issues  discussed  above.  Predictors  from  all  the  major  domains  of 
measured  human  attributes  will  be  considered  and  considerable  attention 
will  be  given  to  evaluating  potential  pre-induction  predictors  from  the 
viewpoint  of  behavioral  consistency  and  construct  validation.  A  strong 
case  must  be  made  for  a  new  predictor,  based  on  Its  documented  empirical 
relationship  to  job  performance  dimensions  similar  to  those  Identified  as 
defining  soldier  performance,  or  on  Its  solid  rational  or  theoretical 
relationship  to  such  dimensions.  Each  predictor  will  be  empirically  eval¬ 
uated  with  regard  to  Its  validity  for  a  number  of  MOS  (validity  generali¬ 
zation),  Its  unique  contribution  to  selecting  and  classifying  candidates 
(Incremental  validity  and  validity  moderation  based  on  MOS  differences), 
and  Its  present  or  potential  degree  of  Implementation  via  computer. 


SPECIFIC  OBJECTIVES 


Objectives  of  Task  2  research  are  as  follows: 

1.  Identify  measures  of  human  abilities,  attributes  or  characteristics 

—  which  are  most  likely  to  be  effective  In  predicting,  prior  to  entry 

Into  the  Amy,  successful  soldier  performance  1  n  general  and  In 
classifying  persons  Into  MOS  where  they  will  be  most  successful,  with 
special  emphasis  on  attributes  not  tapped  by  current  pre-induction 
measures. 

2.  Design  and  develop  new  measures  or  modify  existing  measures  of  these 
"best  bet"  predictors. 

3.  Develop  materials  and  procedures  for  efficiently  administering 
experimental  predictor  measures  In  pilot  tests  and  to  the  FY83/84  and 
FY86/87  cohorts. 

4.  Estimate  and  evaluate  the  reliability  of  new  pre-induction  measures 
and  their  vulnerability  to  motivational  set  differences,  faking, 
variances  In  administrative  settings,  and  practice  effects. 

5.  Determine  the  Interrelationships  (or  covariance)  between  the  new 
pre-induction  measures  and  current  pre-induction  measures. 


6.  Determine  the  degree  to  which  the  validity  of  'tew  pre-induction 
measures  generalizes  across  MOS,  l.e.,  proves  useful  for  predicting 
measures  of  successful  soldier  performance  across  quite  different  MOS 
and,  conversely,  the  degree  to  which  the  measures  are  useful  for 
classification  or  the  differential  prediction  of  success  across  MOS. 

7.  Determine  the  extent  to  which  new  pre-induction  measures  Increase  the 
accuracy  of  prediction  of  success  and  the  accuracy  of  classification 
Into  MOS  over  and  above  the  levels  of  accuracy  reached  by  current 
pre-induction  measures. 


m 


rr 


Gf 


V: 


l 


1.i‘ 


f 


2-14 


OVERALL  SUMMARY  OF  THE  PROCEDURE 


There  are  IS  procedural  steps  or  subtasks  in  Task  2.  Figure  2.1  shows  a 
time  table  for  the  subtasks,  and  Figure  2.2  shows  the  relationships  between 
the  subtasks  and  the  general  nature  of  Inputs  required  from  other  Project  A 
tasks.  Below,  we  briefly  summarize  each  of  the  15  subtasks.  Subtasks  are 
then  described  In  detail  in  the  PROCEDURE  section  to  follow. 

1.  Literature  search  and  planning.  Civilian  and  military  research  about 
the  relative  "success"  of  predictors  for  purposes  of  persons  1  selec¬ 
tion  and  classification  will  be  searched.  Results  of  the  search  will 
be  organized  to  facilitate  selection  of  a  preliminary  battery  and 
formal  technical  and  cost  reviews  of  potential  predictors. 

2.  Selection  of  preliminary  battery  and  preparation  for  administration  to 
FY83/84  longitudinal  sample.  A  set  of  "off-the-shelf"  predictors  that 
comprehensively  and  efficiently  cover  the  predictor  space  will  be 
identified  by  Task  2  staff  and  reviewed  by  ARI.  After  approval,  the 
predictors  will  be  obtained  and  administration  procedures  prepared. 

3.  Administration  of  preliminary  battery  to  FY83/84  longitudinal  sample. 
The  preliminary  battery  will  be  administered  to  a  sample  of  2,100- 
4,600  soldiers  in  training  for  each  of  four  MOS :  05C,  19E/K,  63B,  and 
71L.  On-site  administrators  will  be  trained  by  Task  2  staff  and'  the 
administration  process  will  be  monitored  by  Task  2  staff. 


Project  Months  and  Calendar  Periods 


2-16 


Note.  T.isk  1  Inputs  are  extensive  lo  almost  all  phases  of  Task  2  research.  The  text 
discusses  tills  relationship  In.  detail,  but  those  relationships  arc  not  depleted 
Kr.T|>li  1  r.i 1 1 «  here. 


Technical  review  of  predictor  constructs  and  measures.  Experts  will 
be  used  to  make  formal  judgments  about  the  usefulness  of  predictor 
constructs  and  measures  for  predicting  soldier  performance.  Analyses 
of  these  judgments  will  identify  a  set  of  predictors  judged  to  be 
"best  bets"  In  terms  of  validity  and  efficiency,  l.e.,  minimal  overlap 
between  predictors.  - 

Cost  administrative/practicality  review.  A  panel  of  Army  personnel 
knowledgeable  about  the  field  operation  of  recruitment,  selection,  and 
classification  will  be  given  Information  about  the  administrative  pro¬ 
cedures,  costs.  Item  types,  etc.,  of  the  predictors  surviving  the 
technical  review.  They  will  make  judgments  about  administrative 
feasibility,  possible  operations  problems,  etc. 

Initial  development  of  predictors  for  the  Trial  Battery  (new 
predictors).  Predictors  chosen  for  development  on  the  basis  of  Sub¬ 
tasks  4  and  5  will  be  designed  and  small  scale  tryouts  will  be  held. 

Pilot  tests  of  Trial  Battery  (new  predictors)  In  the  field.  The  new 
predictors  will  be  administered  to  a  sample  of  soldiers  from  the 
FY81/82  cohort  and  a  sample  of  applicants.  Data  will  be  collected  to 
allow  the  Investigation  of  practice  effects,  fakeablllty,  motivational 
effects,  and  stability  of  measures. 

Analysis  of  Trial  Battery.  The  pilot  test  data  will  be  analyzed  to 
investigate  administrative  problems,  applicant  acceptance,  psycho¬ 
metric  properties,  fakeablllty  and  practice  effects,  and  covariances 


c 


of  new  predictors,  current  pre-induction  predictors,  and  any  criterion 
Information  available  for  the  soldiers  In  the  sample. 


I 


1 


r: 


f 


:  v 


£r 


d 


9.  Analyze  Preliminary  Battery:  FY83/84  cohort  school  and  preliminary 
battery  data.  The  covariance  of  the  predictors  In  the  Preliminary 
Battery  with  current  pre-induction  predictors  and  training  success 
will  be  analyzed.  Also,  analyses  of  differences  In  MOS  scores  on 
constructs  measured  by  the  Preliminary  Battery  will  be  analyzed. 

10.  Prepare  revised  Trial  Battery  for  FY83/84  cohort  predictor/ performance 
data  collection.  Information  from  Subtasks  8  and  9  will  be  Integrated 
and  plans  formulated  for  revising  the  new  predictors.  After  review 
and  approval  by  ARI,  revisions  of  the  new  predictors  will  give  rise  to 
the  revised  Trial  Battery.  Training  will  be  provided  to  Project  A 
staff  responsible  for  administering  the  battery. 

11.  Monitor/assist  administration  of  Trial  Battery  to  FY83/84  cohort. 
Although  the  major  burden  of  trial  battery  administration  will  be 
borne  by  other  Project  A  staff.  Task  2  staff  will  administer  the  bat¬ 
tery  for  test-retest  purposes  and  to  a  sample  of  new  recruits,  as  well 
as  providing  "on-call"  assistance  for  the  major  administration  effort. 

12.  Analyze  FY83/84  cohort  data:  trial  battery/performance  measures. 
Data  will  be  available  for  concurrent  validity  analyses  of  the  Trial 
Battery  and  predictive  validity  analyses  of  the  Preliminary  Battery, 


L: 


2-19 


although  on  fewer  MOS  for  the  latter  than  for  the  former.  Fairness 
analyses,  general Izabi 11 ty  analyses,  and  other  analyses  will  also  be 
conducted. 

13.  Prepare  Experimental  Battery  and  prepare  for  administration  to  FY86/87 

_ cohort.  Based  on  analyses  from. Subtask.  12,  the  Experimental  Battery, 

l.e.,  the  final,  revised  version  of  the  Trial  Battery  will  be 
prepared.  Test  administration  materials  will  be  prepared,  and  per¬ 
sonnel  designated  as  on-site  test  administrators  will  be  trained  by 
Task  2  staff. 

14.  Monitor  administration  of  Experimental  Battery  to  FY86/87  cohort, 
further  analyses  of  FY83/84  cohort  data.  The  administration  of  the 
Experimental  Battery  will  be  carried  out  by  Arn\y  personnel  on  site  at 
training  schools.  Task  2  staff  will  make  several  scheduled  Inspection 
visits  as  well  as  any  unscheduled  visits  necessary  to  respond  to  prob¬ 
lems.  During  this  time,  further  analyses  of  the  FY83/84  cohort  data 
will  be  carried  out. 

15.  Analyze  FY86/87  cohort  data  and  prepare  final  reports.  Predictor  re¬ 
sponse  distributions,  covariances,  etc.,  of  the  FY83/84  cohort  and 
FY86/87  cohort  will  be  compared  to  ascertain  If  substantial  differ¬ 
ences  occur  because  of  attrition  (In  the  concurrent  FY83/84  cohort 
sample)  and  other  factors.  Relationships  of  the  predictors  to 
training  performance  will  be  analyzed.  Draft  and  final  Instrument  and 
technical  reports  will  be  prepared. 


n 

K? 


% 


J® 

& 


f: 


c. 

\ 

r.: 


A 


L-; 

IF. 

u. 


£ 


I.-: 


& 


U 


2-20 


We  turn  now  to  a  short  explication  of  the  relationships  between  subtasks. 
As  Figure  2.2  shows,  there  are  five  major  phases  In  Task  2.  The  first 
phase  Is  the  literature  search  and  planning  (Subtask  1)  which  provides 
Information  and  direction  for  the  next  phases.  The  next  two  phases  are: 
the  selection,  administration,  and  analysis  of  a  preliminary  battery  of 
predictors  (Subtasks  2, “3,  and_9);  and  the  development,  pilot  test,  and 
analysis  of  a  set  of  new  predictors,  called  the  Trial  Battery  (Subtasks  4, 
5,  6,  7,  and  8).  Assuming  approval  by  ARI,  the  Preliminary  Battery  will  be 
a  set  of  well-established  "off-the-shelf"  measures  that  best  covers  the 
relevant  predictor  domains  as  Indicated  by  the  literature  search.  The  new 
predictors  will  be  newly  developed  or  modified  measures  Intended  to  measure 
those  predictor  constructs  deemed  most  likely  to  be  effective  for  predict¬ 
ing  soldier  performance,  as  determined  by  the  literature  search  and  two 
rigorous,  formal  evaluation  steps  (Subtasks  4  and  5).  The  inclusion  of  the 
Preliminary  Battery  of  well-established  predictors  at  this  stage  In  the 
project  allows  an  early  examination  of  the  covariance  of  different  types  of 
predictors  (than  those  currently  used)  with  present  pre-induction  predic¬ 
tors,  as  well  as  an  examination  of  the  ability  of  these  different  kinds  of 
predictors  to  predict  performance  during  training.  It  also  makes  possible 
a  predictive  validity  study  of  the  measures  In  the  Preliminary  Battery 
since  job  performance  Information  will  later  be  collected  (in  Subtask  11 
and  in  Task  5)  on  members  of  the  FY83/84  cohort  tested  at  this  time. 

Note  that  these  two  phases  proceed  somewhat  in  parallel.  Early  analyses  of 
the  Preliminary  Battery  (Subtask  9)  will  Inform  the  development  of  the 
Trial  Battery  (Subtask  6).  Thus,  if  these  early  analyses  show  that  some 


2-21 


of  the  constructs  measured  in  the  Preliminary  Battery  are  highly  redundant 
with  ASVAB  or  other  current  pre-induction  measures  or  with  other  measures 
in  the  Preliminary  Battery,  work  will  not  proceed  on  developing  new  meas¬ 
ures  of  those  constructs.  Work  on  Subtask  6  will  not  be  held  up,  however, 
since  there  will  be  constructs  not  included  In  the  Preliminary  Battery  on 
which  development  work  -can  -proceed,  particularly  -  In  the  perceptual/ 
psychomotor  area. 

The  fourth  phase  (Subtasks  10,  11,  and  12)  Includes  the  development  and 
administration  of  the  revised  Trial  Battery,  an  Improved  version  of  the  new 
predictors,  by  building  on  Inputs  from  Subtasks  8  and  9.  In  addition. 
Tasks  4  and  5  provide  Army-wide  and  MOS  specific  job  performance  measures 
which  make  possible  two  validity  Investigations  In  this  phase;  a  concur¬ 
rent  validity  effort  for  the  Trial  Battery  and  a  predictive  validity  effort 

% 

for  the  Preliminary  Battery  (using  as  subjects  those  soldiers  who  completed 
the  Preliminary  Battery  In  Subtask  3  and  for  whom  job  performance  criteria 
are  collected  In  Subtask  11). 

The  final  phase  of  Task  2,  consisting  of  Subtasks  13,  14,  and  15,  Includes 
the  preparation  of  the  Experimental  Battery  based  on  Input  from  Subtask  12, 
administration  of  the  Experimental  Battery  to  members  of  the  FY86/87 
cohort,  analysis  of  the  relationship  of  the  battery  to  training  criteria 
and  subsequent  final  revisions  to  the  battery. 

Although  Task  2  formally  ends  at  this  point,  job  performance  criterion  data 
will  later  be  collected  by  other  Project  A  staff  (from  Task  5)  and  a 


2-22 


predictive  validity  analysis  of  the  Experimental  Battery  will  be  completed 
by  Task  1  staff. 

We  turn  now  to  a  delineation  of  the  Inter-task  dependencies  between  Task  2 
and  Tasks  1,  3,  4,  and  5. 

Task  1.  Task  2  and  Task  1  will  coordinate  their  work  very  closely.  This 
Is  necessary  because,  In  general,  Task  2  staff  will  develop  predictor 
measures  and  collect  the  data  In  the  field,  while  Task  1  staff  will  receive 
the  collected  data  and  prepare  and  edit  the  data.  With  regard  to  analyses. 
Task  2  staff  will  perform  those  analyses  directly  related  to  development 
and  refinement  of  measures,  while  Task  1  staff  will  perform  the  analyses 
directly  related  to  validation  of  predictors.  In  reality,  however,  dev¬ 
elopment  of  measures  and  collection  of  data  have  direct  bearing  on  prepar¬ 
ation  and  editing  of  data  files,  likewise,  analyses  aimed  at  development 
and  refinement  of  measures  have  Implications  for  validation  analyses. 
Finally,  the  results  of  validation  analyses  feed  directly  Into  refinement 
of  predictors.  This  simply  means  that  Task  2  staff  will  be  responsible  for 
close  communication  with  Task  1  staff  with  regard  to  formats  and  content  of 
predictor  measures,  methods  of  data  collection,  and  anticipated  develop¬ 
mental  analyses.  Task  1  staff  will  be  responsible  for  providing  guidance 
and  advice  on  these  matters,  particularly  with  regard  to  anticipated  prob¬ 
lems  In  data  preparation  or  editing  and  alternative  methods  of  develop¬ 
mental  analyses.  Also,  they  will  be  responsible  for  communicating  planned 
validation  analyses  to  Task  2  staff,  so  they,  In  turn,  may  provide  feedback 


on  the  usefulness  of  those  analyses  for  further  refinement  and  development 
of  measures. 

Finally,  all  training  or  job  performance  criterion  measures  that  are  input 
to  analyses  performed  by  Task  2  or  which  Impact  on  Task  2  efforts  will  only 
be  available  through  Task  1,  since  they  are  managing  the  data  base.  On  the 
other  hand,  conceptual  Information  or  constructs  underlying  training  and 
job  performance  criteria  will  pass  directly  from  Tasks  3,  4,  and  5  staff  to 
Task  2  staff. 

Task  3.  Task  3  staff  will  develop  measures  of  training  performance  and  be 
responsible  for  collecting  these  data.  These  data  will  be  added  to  the 
longitudinal  research  data  base  (LRDB)  by  Task  1  and  will  then  be  available 
for  analyses  by  Task  2.  Subtask  9  and  Subtask  15  of  Task  2  rely  on  Task  3 
providing  training  measures  (through  Task  1).  In  addition,  Task  3  staff 
will  provide  to  Task  2  staff  conceptual  Information  about  training 
performance  constructs,  based  on  their  early  work  with  training  school 
instructors  and  review  of  training  measures.  Task  2  staff  will  use  this 
Information  In  Subtask  4  (technical  review),  In  September,  1983. 

Tasks  4  and  Task  5.  Both  of  these  tasks  are  responsible  for  developing 
Improved  measures  of  job  performance  and  collecting  these  data.  There  are 
three  major  types  of  Interaction  between  these  two  tasks  and  Task  2.  The 
first  kind  of  Interaction  is  the  provision  to  Task  2  staff  of  conceptual 
Information  about  job  performance.  This  information  Is  Input  to  Subtasks  2 


(selection  of  a  Preliminary  Battery)  and  4  (technical  review  of  prediction 
constructs  and  measures).  Staff  from  Tasks  4  and  5  have  already  provided 
Information  to  Task  2  staff  and  that  information  is  being  used  in  Subtask 
2.  Task  2  staff  has  provided  Tasks  4  and  5  staff  with  further  guidance  on 
the  nature  and  format  of  their  Information  requirements  for  Subtask  4, 
based  In  part  on  the  job  performance  information  already  provided  to  Task  - 
2.  The  Information  from  Tasks  4  and  5  Is  required  by  September  1,  1983. 

The  second  interaction  Is  the  provision  of  data  on  job  performance.  Task  5 
staff  will  collect  this  Information  and  provide  It  to  Task  1  staff,  who 
will  make  It  available  to  Task  2  staff  through  the  LRDB.  These  data  are 
required  for  the  analyses  carried  out  by  Task  2  staff  In  Subtask  12, 
beginning  In  October,  1985. 

Finally,  the  staff  of  Task  5  Is  responsible  for  the  collection  of  data  on 
the  predictor  measures  administered  to  the  FY83/84  cohort  during 
June-September,  1985.  Task  2  staff  will  provide  the  predictor  measures, 
administration  manuals,  and  training  required  to  administer  the  Trial 
Battery.  The  data  collection  teams  from  Task  5  will  then  administer  the 
measures,  with  on-call  assistance  from  Task  2  staff. 


PROCEDURE 


Subtask  1:  UttNturg  Search  and  Planning 

Rationale.  As  mentioned  earlier,  the  present  set  of  pre-induction  predic¬ 
tors  does  not  Include  measures  of  several  domains  of  human  attributes  that 
have  been  useful  In  other  settings  for  predicting  work  performance  and  In 
classifying  persons  according  to  occupations  for  which  they  are  most  quail  - 
- fled.  In  this  regard, -there  appear  to  be  dozens  of  constructs  In-the  per¬ 
ceptual ,  psychomotor,  biographical  and  vocational  Interest,  and  cognitive 
domains  that  are  not  presently  measured  during  current  Amy  pre-induction 
screening  procedures. 

A  listing  of  from  70  to  90  variables  covering  the  major  human  attribute 
domains  can  be  thought  of  as  the  "whole  person"  approach  for  Identifying  a 
relevant  set  of  job  performance  predictors.  In  theory.  If  all  major 
domains  of  human  attributes  were  to  be  adequately  covered  by  appropriate 
measures,  then  one  would  expect  to  be  able  to  predict  performance  In  almost 
any  job.  Even  If  this  were  to  be  the  case.  It  Is  obvious  that  we  will  not 
have  the  luxury  of  measuring  so  many  human  attributes. 

Thus,  It  Is  necessary  in  our  research  to  narrow  our  gaze  eventually  to 
those  constructs  that  have  the  greatest  likelihood  of  meeting  our  afore- 
stated  emphasis  on  performance  relatedness.  The  literature  search, 
therefore,  must  Identify  as  many  potentially  useful  predictors  as  possible 
and  provide  information  about  those  predictors  in  a  manner  useful  for 


2-26 


as 


C.' 


I' 


n 

n 

r 


Subtasks  2  and  4,  selection  of  a  preliminary  battery  and  technical  review 
of  predictors.  , 

In  addition,  this  subtask  1  -  e$  substantial  effort  devoted  to  planning. 

As  the  project  began,  new  In.ormatlon  has  required  some  changes  In  project 
- desl gnand,  consequently,  revisions  to-Task  2  procedures  and  resource-al 1 o- 
cat Ions  became  necessary.  We  have  no  doubt  that  constant  attention  to 
plans  and,  when  necessary,  modification  of  those  plans,  will  continue 
throughout  the  project. 

Procedures.  For  convenience,  the  potential  predictor  domain  has  been 
divided  Into  three  areas:  (a)  cognitive/perceptual;  (b)  vocational 

Interest,  and  biographical;  and  (c)  psychomotor  abilities.  A  team  has  been 
formed  for  each  domain  consisting  of  a  leader  with  one  to  three  research 
associates  or  assistants,  and  one  or  more  expert  consultants.  Each  team 
has  been  searching  and  reviewing  the  literature  within  Its  domain.  Team 
leaders  report  to  the  task  leader  who  has  coordinated  the  search. 

1.  Review  Forms.  Two  forms  have  been  developed  to  record  Information 
from  documents  as  they  are  reviewed.  The  Intent  of  these  review  forms 
is  to  capture  the  Information  so  that  a  critical,  technical  review  can 
be  performed  later.  In  Subtasks  2  and  4.  (The  Initial  reviewers  will 
also  make  a  critical  review,  but  they  primarily  Insure  that  adequate 
Information  Is  recorded  for  the  later  technical  review.)  The  content 
of  the  review  forms  Is  identical  across  domain  teams  and  Includes  In¬ 
formation  relevant  to  evaluating  each  potential  predictor. 


2-27 


2.  Search  plans.  Plans  for  searching  each  domain  have  been  formulated. 

Predictors  used  currently  or  In  the  past  by  the  Army  and  other  ser- 
vices  are  being  reviewed,  as  well  as  predictors  used  In  public  and 
private  sectors.  Published  and  unpublished  literature  Is  being 
searched.  Appropriate  computer  searches  of  educational,  psycho* 

logical,  buslness  and  government  areas  have  been  completed. _ Journals  _ \y. 

known  to  be  highly  relevant  have  been  systematically  reviewed.  Domain 
experts  have  Identified  unpublished  or  “In  press"  research  as  well  as  «£ 

researchers  to  be  directly  queried.  t 


3.  Management  of  search.  Team  leaders  will  monitor  the  completed  reviews 
for  completeness  and  accuracy.  A  list  of  citations  will  be  compiled 
within  domains  and  reviewed  by  team  leaders  and  the  expert  to  Insure 
comprehensiveness  of  the  review.  Team  leaders  will  organize  the  re¬ 
views  by  predictor  content  or  construct  and.  If  possible,  cross  ref¬ 
erence  by  criteria  "predicted."  The  task  leader  will  review  citation 
lists  and  a  sample  of  completed  reviews  and  meet  with  task  leaders  at 
least  biweekly. 


b 


K's 


The  task  leader  will  also  maintain  contact  with  leaders  of  other 
tasks,  especially  Tasks  4  and  5,  In  order  to  obtain  and  update  per- 

V  ^ 

formance  criteria  Information.  As  this  Information  becomes  available, 

*"  „ 

Its  Implications  for  the  predictor  literature  search  will  be  evaluated  u 

and  used  to  redirect  the  search,  If  necessary.  Thus,  If  delinquency 
(AWCt,  drunk  driving,  drug  abuse,  etc.)  appears  to  be  an  Important 

i  '*« 

criterion  set,  then  research  attempting  to  predict  such  phenomena  will 
be  reviewed,  If  It  has  not  been  reviewed  previously. 


&rs 


\  5 


•»- 


& 


B 


4.  Planning.  The  task  leader  has  been  (and  will  continue)  to  keep  in 
close  contact  with  his  counterpart  at  ARI  and  with  the  project  direc¬ 
tor,  principal  scientist,  director  of  technical  planning,  and  other 
task  leaders  of  Project  A.  He  has  attended  a  course  on  PAC  III  In 
order  to  facilitate  future  resource  planning.  Written  revisions  of 
-the -research  -plan-will  be  prepared  -as  necessary,-  but-  telephonic, 
WYLBUR,  and  personal  meetings  among  the  persons  mentioned  above  will 
be  the  primary  method  of  keeping  plans  on  track. 


Subtatk  2:  Selection  of  Preliminary  Battery  and  Preparation  for  Adminis¬ 
tration  to  FY83/84  Longitudinal  Sample 

Rationale.  After  the  project  began,  it  became  apparent  that  it  would  be 
desirable  to  collect  data  from  a  relatively  large  sample  of  soldiers  on  new 
pre-induction  predictors  at  a  point  somewhat  earlier  In  the  project  than 
originally  planned.  With  regard  to  Task  2,  there  are  two  primary  reasons 
for  administering  a  preliminary  battery.  First,  the  collection  of  data  on 
a  number  of  new  predictors  that  comprehensively  represents  the  types  of 
predictors  not  currently  In  use  will  allow  an  early  determination  of  the 
extent  to  which  such  predictors  contribute  unique  variance,  or  actually 
measure  human  attributes  not  measured  by  current  pre-induction  predictors. 
This  Information  will  be  useful  for  guiding  the  development  of  new  pre¬ 
dictors  Into  areas  most  likely  to  be  useful  for  Increasing  the  accuracy  of 
prediction  and  classification.  Second,  the  early  collection  of  preliminary 
battery  data  on  soldiers  during  their  advanced  training  phase  allows  the 
conduct  of  a  predictive  validity  Investigation  using  new  pre-induction 


2-29 


predictors  much  earlier  In  the  project.  Thus,  empirical  data  on  the 
predictive  validity  of  new  predictor  constructs  will  be  available  36  months 
after  the  project  begins. 

The  purpose  of  this  subtask  then  Is  to  make  a  careful  selection  of  measures 
of  new  pre-induction  predictor  constructs  In  order  that  advantage  Is  taken 
of  the  design  features  just  described. 


Procedures.  The  literature  review  forms  will  serve  as  the  primary  Input  to 
a  careful  critical  review  of  potential  measures  for  Inclusion  In  the  Pre¬ 
liminary  Battery.  The  Preliminary  Battery  must  necessarily  be  made  up  of 
"off-the-shelf"  Instruments  because  there  Is  too  little  time  prior  to  the 
scheduled  administration  of  the  Preliminary  Battery  to  develop  and  pilot 
test  new  measures  of  constructs  deemed  potentially  useful.  This  means,  for 
example  that  a  single  published  Interest  Instrument  that  Is  judged  to  best 
cover  the  constructs  In  that  area  will  probably  be  chosen,  rather  than 
selecting  several  scales  from  each  of  several  different  Instruments  and 
printing  a  new  Instrument  containing  those  scales. 

Assuming  approval  (for  research  purposes)  of  the  use  of  "off-the-shelf" 
measures  by  ARI,  the  review  process  for  selection  of  the  battery  will 
consist  of  two  major  steps:  an  internal  review  by  Task  2  staff  with 
cooperation  by  the  ARI  Task  2  monitor  and  a  presentation  to  and  subsequent 
review  by  ARI  of  the  candidate  predictors  selected  for  the  Preliminary 
Battery.  The  Internal  review  will  proceed  as  follows:  each  domain  team 


"  | 

Tv  •  i 


t- 


| 

*  *”«*  < 
;-v 

r 

S 


J 


will  prepare  an  Initial  list  of  possible  predictor  measures,  organized 
within  fairly  broad  constructs  (List  1).  The  teams  will  then  apply  the  13 

technical  predictor  evaluation  criteria  (previously  submitted  to  ARI)  to 
narrow  this  list  to  those  predictors  that  are  serious  possibilities.  Thus, 
predictors  with  major  problems  with  regard  to  one  or  more  of  the  13  evalua¬ 
tion  criteria  will  be  eliminated.  In  addition,  those  predictors  that  are 
not  readily  available  for  large-scale  administration,  l.e.,  are  not  really 
"off-the-shelf",  will  be  eliminated  (List  2).  List  2  will  then  te  examined 
with  regard  to  the  13  criteria,  but  In  a  comparative  sense.  Thus,  predic¬ 
tors  will  be  compared  to  each  other,  as  well  as  against  the  evaluation 
criteria.  Also,  the  job  performance  criterion  Information  presently  In 
hand  from  Tasks  4  and  5  will  be  cast  against  the  candidate  predictors. 
This  process  will  result  in  a  third  list  that  contains  the  recommended  pre¬ 
dictor  measures  and  one  or  more  alternatives,  all  ranked  by  preference. 
List  3  will  be  presented  to  the  task  leader  and  all  other  Task  2  staff  In  a 
review  meeting  and  the  choices  of  predictors  will  be  examined  and  modified. 
If  necessary,  during  this  meeting.  We  tnlnk  It  would  be  extremely  useful 
for  the  ARI  Task  2  monitor  to  participate  In  this  review  meeting,  which 
will  occur  during  the  last  part  of  April  or  first  week  In  May. 

Following  this  meeting,  the  recommended  Preliminary  Battery,  Including 
suitable  alternative  measures,  will  be  presented  to  ARI  by  the  Task  2 
leader.  This  could  best  be  accomplished  In  a  one-day  meeting  In  Washington 
In  mid-May,  1983.  After  approval  of  the  battery  by  ARI  and  other  necessary 
rev1ew1ng  authorities.  Task  2  staff  will  procure  materials  necessary  to 
administer  the  battery,  make  arrangements  with  publishers,  and  carry  out 


4,HAV?A.W-*.P-«.1»  ”4  *  »y.  *.»  '  4*  4  *  ••  %  *J.  *  4  V  «  ”  4  •.*,i»4-,,4.\\*.*/  V*  v. 


S.  \  *4 


r.  »  .  i  ,<, 


V.  N.  \  \ 


.»  «  \~  m  .\  -a  ■  .  ■»  ■  \  •  ;%  v.  ■  i  li  •»,*.! 


any  other  activities  necessary  to  secure  the  Instruments  In  sufficient 
numbers  to  complete  the  administration  of  the  battery. 

The  next  major  step  will  Involve  planning  the  administration  process  and 
preparing  manuals  and  training  materials  for  use  by  the  on-site  adminis¬ 
trators.  As  part”  of  the  preparation,  we  will  pretest  the  selected  instru¬ 
ments  on  a  sample  of  40  soldiers  In  training  or  In  a  variety  of  MOS,  from 
either  TRADOC  or  F0RSC0M.  The  purpose,  of  this  pretest  Is  to  Identify  prob¬ 
lems  with  test  Instructions  and  logistics  of  the  administration  process. 
Four  hours  of  soldier  time  would  be  required.  This  pretest  would  occur  In 
July,  1983. 

Two  points  need  to  be  raised  here  about  the  Preliminary  Battery.  First,  no 
computer-administered  measures  will  be  Included  due  to  the  short  lead  time 
available  to  prepare  for  the  administration.  (We  note,  however,  that  such 
measures  will  be  Intensively  Investigated  and  developed  as  part  of  the 
development  of  the  Trial  Battery.)  Second,  Initial  Inquiries  of  TRADOC 
Indicate  that  soldier  time  at  AIT  schools  Is  generally  allocated  In  four 
hour  blocks.  We  think  four  hours  Is  sufficient  time  to  administer  "off- 
the-shelf"  measures  of  biographical  information,  vocational  interest,  and 
cognitive/perceptual  ability.  These  measures  can  be  group  administered. 
Physical  and  psychomotor  measures  present  some  difficulty,  however.  Almost 
all  of  these  measures  require  Individual  administration.  It  appears  to  us 
at  this  time  that  the  administration  of  a  sufficiently  comprehensive  set  of 
Individually  administered  psychomotor  tests  and  the  group  administered 


2-32 


wmsmm 


r 


£ 

« s 


iv* 


r\ 


v:. 


if 


p  U 


r*y'*»*i 


measures  will  be  very  difficult  to  accomplish  In  four  hours.  Furthermore, 
the  administration  of  Individually  administered  measures  requires  more 
expertise  on  the  part  of  the  administrator  than  may  be  available  on  site. 
If  this  Is  the  case,  much  more  training  of  administrators  would  be 
required. 


These  problems  could  be  overcome  by  obtaining  more  soldier  time  and  expend¬ 
ing  more  resources  on  training  administrators  or  by  electing  to  cut  back 
somewhat  on  the  range  of  constructs  covered  in  the  Preliminary  Battery.  We 
favor  the  latter  option  at  this  time.  Recall  that  one  of  the  primary 
reasons  for  the  Preliminary  Battery  is  to  determine  the  amount  of  unique 
variance  that  would  be  contributed  by  new  predictors.  The  psychomotor 
measures  are  perhaps  the  least  likely  of  the  potential  set  of  new 
predictors  to  correlate  highly  with  the  major  current  predictor,  the 
ASVAB.  Therefore,  there  seems  much  less  cause  for  concern  If  they  are  not 
Included  In  the  Preliminary  Battery.  Again,  we  point  out  that  psychomotor 
measures  will  be  Investigated  as  part  of  the  Trial  Battery  development. 
Measures  of  bio-data,  vocational  Interests,  and  cognitive/perceptual  tests 
are  of  much  more  Interest  with  regard  to  their  overlap  with  the  ASVAB  and 
with  each  other  and  should  definitely  be  Included. 

As  part  of  this  subtask,  a  detailed  outline  of  the  literature  review  report 
will  be  prepared  and  delivered  to  ARI.  (The  full  report  will  be  prepared 


and  delivered  after  the  Preliminary  Battery  is  in  the  field,  l.e.,  In  Sub 
task  4). 


Subtask  3:  Admin 1 strati on  of  Preliminary  Battery  to  FY83/84  Longitudinal 

Sample 

Rationale.  The  preliminary  battery  will  be  administered  to  soldiers  in  AIT 
for  four  MOS:  05C  (Fort  Gordon),  71L  (Fort  Jackson),  63B  (Fort  Dlx  and 
Fort  Leonard  Wood),  and  19E/K  (Ft.  Knox).  These  four  MOS  were  picked  to 
represent  a  diversity  of  job  types  and  because  they  had  sufficient  numbers 
going  through  AIT  to  meet  sample  size  requirements,  l.e.,  enough  tested 
soldiers  In  these  MOS  will  still  be  In  the  Army  and  available  for 
collection  of  job  performance  criterion  data  In  June-September,  1985  at  the 
sites  visited  (see  INTRODUCTION  section  on  sampling).  Local  on-site 
administrators  will  be  required  to  administer  the  preliminary  battery. 
This  Is  necessary  because  it  Is  Impractical  for  Task  2  staff  to  travel  to 
the  five  sites  each  time  a  new  class  begins  (the  soldiers  will  be  tested 
during  the  first  week  of  training,  see  Procedures  for  this  subtask)  or  to 
live  on-site  for  the  long  time  period  over  which  the  data  will  be  collected 
(October,  1983-June,  1984).  Thus,  Task  2  will  be  responsible  for  training 
these  administrators  and  monitoring  the  administration  process. 

Procedures.  During  the  month  preceding  the  beginning  of  data  collection 
(September,  1983),  Task  2  staff  will  visit  each  of  the  five  administration 


TV 

f-V 


S3 


i  • 


t'- 


5 


£ 


Km. 


U* 


% 


2-34 


iCi 


sites  for  approximately  one  week.  Prior  to  these  visits,  arrangements  for 
selecting  the  on-site  administrators  will  be  completed.  The  administrators 
will  In  all  likelihood  be  either  active  duty  Army  personnel  with  appropri¬ 
ate  backgrounds  In  training  or  personnel,  l.e.,  general  familiarity  with 
testing,  training,  or  personnel  work;  or  local,  contracted  personnel  with 
appropriate  experience.  We  have  been  and  are  now  -making  Inquiries  at 
TRADOC  posts  about  the  availability  of  personnel  for  test  administration. 
These  Inquiries,  to  date.  Indicate  that  active  duty  Army  personnel  with 
appropriate  experience  are  present  at  the  sites  (primarily,  these  personnel 
are  AIT  Instructors)  and  could  administer  the  tests.  We  do  not,  of  course, 
have  committments  for  such  personnel  since  that  can  only  come  through 
appropriate  troop  support  requests.  Use  of  Army  personnel,  rather  than 
local  contracted  personnel,  would  avoid  Incurring  expenses  not  originally 
budgeted. 

Regarding  the  selection  and  training  of  test  administrators,  we  prefer  to 
rely  primarily  on  training.  Task  2  staff  have  had  extensive  experience  In 
preparing  test  administration  procedures  and  manuals  and  In  providing 
training  for  test  administrators.  We  have  trained  persons  with  little 
experience  In  testing  In  several  large-scale  validation  studies  and 
achieved  satisfactory  results  In  terms  of  quality  of  data  (Ounnette,  et 
al.,  1981;  Peterson  &  Houston,  1980;  Peterson,  Houston,  4  Rosse,  In 
press).  A  minimum  of  two  administrators  should  be  selected  to  be  trained. 
This  number  depends  on  the  size  of  classes  and  desired  size  of  the  pool  of 
trained  administrators.  It  could  be  as  large  as  10  to  12  persons  at  some 
posts.  It  Is  essential  that  one  person  at  each  site  be  designated  as  the 
primary  contact.  This  person  does  not  necessarily  need  to  be  Involved  in 


the  actual  administration  of  the  battery  but  will  be  responsible  for 
securely  storing  all  supplies,  shipping  completed  batteries  to  Task  2 
staff,  monitoring  attendance  at  testing  sessions,  and  communicating  with 
Task  2  staff. 

The  site  visit  activities  will  Include  a  one-day  training  session  for  the 
administrators  on  administration  procedures.  Inspection  of  facilities,  com¬ 
pleting  scheduling  of  sessions  and  other  details.  At  this  point,  we 
believe  the  facilities  required  will  be  one  or  two  air-conditioned  class¬ 
rooms  large  enough  to  accommodate  at  least  50  persons,  equipped  with  stu¬ 
dent  desks.  (One  classroom  will  normally  be  sufficient  for  some  sites, 
based  on  TRAOOC  estimates  of  class  size,  but  some  classes  at  some  sites  are 
large  enough  to  require  two  or  even  more  classrooms  In  order  to  complete 
testing  of  an  entire  class  during  the  same  day.) 

After  the  Initial  site  visits,  Task  2  will  communicate  by  telephone  at 
least  weekly  to  monitor  administration  activities.  In  addition,  completed 
batteries  will  be  sent  to  Task  2  staff  as  soon  as  they  are  completed  and 
these  will  be  inspected  to  detect  any  abnormalities  or  problems.  As  prob¬ 
lems  surface,  Task  2  staff  will  return  to  the  sites  to  assist  In  their 
solution.  Even  If  no  problems  surface,  at  least  two  "monitoring"  site 
visits  will  be  made  to  observe  the  administration  procedures. 

Two  final  comments  about  data  collections  are  In  order.  First,  the  Prelim¬ 
inary  8attery  should  be  administered  to  soldiers  during  their  first  week  at 
AIT,  If  possible.  Recall  that  the  Preliminary  Battery  scores  will  be 


correlated  with  training  performance  (Subtask  9)  and  later,  with  job  per¬ 
formance.  Thus,  the  earlier  In  AIT  that  the  battery  Is  administered  the 
less  opportunity  there  Is  for  the  training  Itself  to  influence  scores  on 
the  battery  and,  therefore,  contaminate  the  correlational  analyses. 

Second,  we  have  planned  our  schedule  to  begin  testing  in  October,  1983. 
There  are  two  reasons  for  this: 

1.  The  summer  months  see  a  large  influx  of  National  Guard  and  Reserve 
soldiers  for  training.  This  complicates  the  administration  process  in 
the  sense  that  we  cannot  use  these  soldiers  for  purposes  of  the  pre¬ 
dictive  validity  study,  and  TRADOC  has  advised  us  that  it  Is  adminis¬ 
tratively  very  difficult  to  separate  regular  soldiers  from  Reserve  and 
National  Guard  soldiers  within  classes  for  purposes  like  the  Prelim¬ 
inary  Battery  testing.  That  Is,  each  class  is  treated  as  a  single 
unit  and  apparently  40-60  percent  of  a  class  could  be  Reserve  or 
National  Guard  during  the  summer. 

2.  There  are  at  least  two  other  programs  requiring  testing  of  soldiers  at 
TRADOC  for  some  period  during  the  summer— the  Basic  Skills  Educational 
Program  (BSEP)  and  validation  work  on  a  physical  fitness  battery. 

These  two  facts  make  an  October  start  date  more  feasible  than  a  summer 
start  date.  We  have  obtained  estimates  of  the  input  for  the  four  MOS'  and 
an  October  start  date  would  still  permit  the  collection  of  the  targeted 
sample  sizes  In  two  of  the  four  MOS  (and  nearly  so  for  the  other  two). 


Draft  field  test  plans  containing  full  details  on  tests  to  be  administered 
and  facilities  required  will  also  be  prepared  as  part  of  this  subtask, 
after  the  content  of  the  Preliminary  Battery  has  been  determined.  A  final 
field  test  plan  will  be  prepared  after  AR I  review. 

Subtask  4:  Technical  Review  of  Predictor  Constructs  and  Measures 


£ 

{V 


Rationale.  A  great  deal  of  Information  will  be  discovered  and  reviewed 
during  the  literature  search.  This  Information  must  be  subjected  to  a 
careful,  thorough  review  in  order  to  Identify  the  "best  bet"  set  of  predic¬ 
tor  constructs  and  measures.  One  part  of  this  subtask  Is  designed  to 

achieve  that  goal  by  using  a  formal  judgment  process  employing  experts. 
The  method  for  this  subtask  has  been  used  successfully  by  Bownas  and 
Heckman  (1976);  Peterson,  Houston,  Bosshardt,  and  Dunnette  (1977);  Peterson 
and  Houston  (1980);  and  Peterson,  Houston,  and  Rosse  (In  press),  in  Iden¬ 
tifying  predictors  for  the  jobs  of  firefighter,  correctional  officer,  and 
entry-level  occupations  (clerical  and  technical),  respectively,  and 
Peterson  and  Bownas  (1982)  provide  a  complete  description  of  the  method¬ 
ology.  In  this  technique,  descriptive  Information  about  a  set  of  predic¬ 
tors  and  the  job  performance  criterion  variables  are  given  to  "experts." 
The  experts  make  estimates  of  the  relationship  between  predictor  and  cri¬ 
terion  variables,  generally  with  a  five-point  scale  or  even  by  directly 
estimating  the  correlation  coefficients.  The  final  result  Is  a  matrix  with 
predictor  and  criterion  variables  as  the  columns  and  rows,  respectively. 
Cell  entries  are  experts'  estimates  of  the  degree  of  relationship  between 
the  particular  predictors  and  various  criteria.  The  interrater  reliability 


R 


i 


1? 

Is 


i'> 


t  ■ 


2-38 


u 


of  the  experts'  estimates  Is  first  checked.  In  general,  these  reliabil¬ 
ities  have  been  Impressive,  In  the  .80  to  .90  range  for  about  10  to  12 
experts.  If  the  estimates  are  reliable,  then  the  matrix  of  predictor- 
criterion  relationships  can  be  analyzed  and  used  in  a  number  of  ways.  The 
covariances  of  the  predictors  can  be  estimated  by  using  the  profiles  of 
-their  estimated  relationships,  with  the  criteria,  that  Is  correlate  the 
columns.  These  covariances  can  then  be  factor  analyzed  In  order  to 
Identify  predictors  that  function  similarly  with  regard  to  predicting  the 
job  performance  criteria.  The  covariances  of  the  criteria  can  be  similarly 
examined,  and  criteria  likely  to  be  predicted  by  a  common  set  of  predictors 
can  be  Identified.  In  addition,  equations  relating  predictors  to  criteria 
can  be  derived  for  either  the  predictor  variables  or  factors.  In  this  way, 
redundancies  and  overlap  In  the  predictor  set  can  be  identified  and  an 
efficient.  Integrated  set  of  predictors  chosen  to  carry  forward  Into  later 
phases  of  the  research. 

Previous  use  of  this  methodology  (studies  cited  above)  shows  clearly  that 
when  content-valid  performance  criteria,  l.e.,  based  on  accurate  job  Infor¬ 
mation  derived  from  task  analyses,  critical  Incident  studies,  etc.,  and 
predictor  measures  found  through  a  careful  literature  search  are  used  by 
experienced  psychologists  and  other  experts,  reliable  estimates  are  ob¬ 
tained.  Moreover,  predictors  selected  and  weighted  according  to  such 
estimates  have  shown  significant,  empirical  relationships  with  measures  of 
Job  performance  In  follow-up  studies. 

In  addition  to  the  formal  judgment  process  described  Just  above,  another 
type  of  technical  review  will  be  carried  out  for  the  perceptual /psychomotor 


domain  of  measures.  In  the  course  of  our  literature  search  and  Initial 
review  of  research  In  this  area.  It  has  become  apparent  that  relatively 
little  research,  especially  criterion-related  validation  work,  has  been 
completed  on  recently  developed  computer-adml nl stered  perceptual/ 
psychomotor  measures.  This  method  of  measuring  perceptual /psychomotor 
measures,  however,  has  great  practical  and  theoretical  advantages  over  the . 
older  kinds  of  methods  using  more  cumbersome  mechanical  and  electrical 
apparatus.  It  Is  our  present  judgment  that  perceptual /psychomotor  measures 
must  only  be  pursued  via  computer  administration.  It  makes  little  theoret¬ 
ical  or  practical  sense  to  attempt  to  validate  and  operationalize  the  1950s 
technology  of  measuring  such  variables.  Therefore,  we  will  carry  out  a 
less  formal  and  more  wide-ranging  review  of  predictor  measures  In  this  area 
that  will  take  place  prior  to  the  formal,  expert  judgment  review.  It  Is 
desirable  to  carry  out  this  review  as  early  as  possible  so  that  we  may 
quickly  begin  developmental  work  on  these  measures  (In  Subtask  6). 

Procedures.  The  first  two  activities  In  this  subtask  related  to  the  formal 
Judgment  process  are  the  development  of  the  definitions  of  the  rows  (cri¬ 
terion  constructs)  and  columns  (predictor  measures)  of  the  judgment 
matrix.  With  regard  to  the  rows  or  criterion  constructs,  Task  2  staff  has 
requested  Information  from  the  staff  of  Tasks  3,  4,  and  5.  The  product 
that  Task  2  staff  will  receive  from  the  three  other  tasks  is  similar  In 
format,  to  wit:  the  name  of  the  criterion  construct  or  dimension;  a  brief 
definition  of  the  construct;  elaborations,  examples,  Illustrative  mater¬ 
ials,  or  other  further  explanatory  Information;  a  brief  description  of  the 
data  base,  analytic  methods  used,  or  other  Information  that  would  allow 
Task  2  staff  to  properly  evaluate  the  Information.  The  substantive 


content  will,  of  course,  vary  across  the  three  other  tasks  as  will  the 
sources  of  information  they  draw  upon  to  develop  their  set  of  criterion 
constructs.  After  receipt  of  these  lists  of  criterion  constructs  from 
Tasks  3,  4,  and  5,  Task  2  staff  will  review,  edit,  and  Integrate  them  as 
required  for  the  technical  review  process.  (Task  leaders  from  Tasks  3,  4, 
and  5  have  Inf ormed.the  Task  2  leader  that_they  can  develop  such  lists  and 
deliver  them  to  Task  2  staff  by-September  1,  1983.) 

With  regard  to  the  columns  or  predictors,  the  task  leader  and  team  leaders 
will  critically  evaluate  all  predictor  Information  contained  In  the 
literature  review  forms  completed  during  the  literature  search  (Subtask 
1).  Sets  of  "best  bet"  predictors  will  be  chosen  based  on  application  of 
the  evaluation  criteria  mentioned  In  Subtask  2.  If  sufficient  Information 
Is  available  for  predictors,  Bayesian  priors  will  be  computed  as  outlined 
by  Schmidt  and  colleagues  (Schmidt,  Hunter,  &  Caplan,  1981).  At  this 
point,  more  predictors  than  would  be  feasible  to  study  further  will  be 
retained.  Within  each  domain,  the  selected  predictors  will  be  placed  in 
one  of  three  categories:  (a)  predictors  with  existing,  adequate  measures; 
(b)  predictors  with  existing  measures,  but  requiring  some  modification 
prior  to  Army  use;  and  (c)  predictor  constructs  or  variables  with  no 
existing  measures  or  measures  that  require  almost  total  development.  (Only 
predictor  constructs  with  very  strong  theoretical  or  content-related 
promise  would  be  retained  In  the  third  category.  On  the  other  hand,  such 
"new"  predictors  might  very  well  be  the  most  desirable  In  terms  of 
predicting  portions  of  the  criterion  space  not  now  predicted.)  Task  2 
staff  will  then  prepare  a  packet  for  each  predictor  that  provides  a 


concise,  comprehensive  description  of  the  construct  measured,  reliability, 
validities,  adverse  Impact,  etc. 

Then,  an  appropriate  set  of  experts  will  be  asked  to  complete  the  matrix, 
as  described  in  the  rationale  section  of  this  subtask.  The  experts  should 
be  Industrial,  measurement,  or  differential  psychologists  with  experience 
and  knowledge  In  personnel  selection  research  and/or  application. 

Some  8  to  13- of  the  experts  will  be  drawn  from  the  PORI,  HumRRO,  and  AIR 
researchers  working  on  this  task:  the  task  leader  (1),  leaders  of  the  var¬ 
ious  predictor  domain  literature  searches  (2-3),  the  consultant  experts  for 
the  predictor  domains  (3-5),  and  several  of  the  research  associates  who 
have  worked  on  the  literature  search  (2-4).  ARI  psychologists  will  also  be 
asked  to  complete  the  task. 

The  ratings  can  be  completed  without  a  meeting  of  the  raters.  Telephone 
communication  Is  generally  sufficient.  Completing  the  ratings  should 
require  one  to  two  days. 

The  ratings  will  then  be  analyzed  by  Task  2  staff  as  described  In  the 
rationale  section.  The  data  will  be  submitted  to  Task  1  staff  for 
Inclusion  In  the  data  base.  We  note  here  that  the  analysis  of  the  expert 
ratings  will  be  the  primary  mode  of  selecting  the  technically  best  set  of 
predictors  for  further  development.  However,  we  will  carefully  examine  the 
possibility  of  Including  some  measures  that  were  Included  In  the  Pre¬ 
liminary  Battery  in  the  Trial  Battery,  even  if  these  measures  do  not 
survive  technical  review.  Such  Inclusion  Is  desirable  In  order  to  have 


well-defined,  reliable  marker  variables  for  factors  analyses  of  the  Trial 
Battery,  which  will  occur  In  Subtask  8.  In  addition,  comparisons  of 
predictive  vs  concurrent  validity  can  be  made  with  these  cofomon  predictors 
when  they  are  administered  twice  to  the  FY83/84  cohort. 

The  technical  review  for  computer-administered,  perceptual /psychomotor 
measures  will  have  three  steps.  First,  the  relevant  predictor  review 
forms,  l.e.,  those  describing  perceptual /psychomotor  measures,  from  the 
literature  search  will  be  critically  evaluated  just  as  In  the  formal  judg¬ 
ment  process.  Second,  Task  2  staff  will  make  visits  to  Army  and  other  ser¬ 
vices  locations  that  are  currently  engaged  In  developing  or  validating  such 
measures.  These  site  visits  will  Include  at  a  minimum.  Fort  Rucker,  Fort 
Knox,  and  the  Air  Force  Laboratory  In  San  Antonio.  The  general  purpose  of 
these  visits  Is  to  learn  as  much  as  possible  about  current  development  of 
these  measures.  More  specifically,  we  will  focus  on  hardware  configura¬ 
tions,  available  software  and  problems  In  software  development,  psycho¬ 
metric  properties  of  these  types  of  measures,  especially  with  regard  to 
reliability  and  practice  effects,  criterion-related  validity  data,  and 
Information  on  the  kinds  of  job  performance  constructs  that  are  being  pre¬ 
dicted  and/or  simulated.  After  the  completion  of  these  two  activities, 
Task  2  staff  will  review  the  available  Information  and  Identify  the 
perceptual /psychomotor  constructs  that  appear  to  be  most  worthy  of  further 
development.  This  review  will  use  the  13  evaluation  criteria  used  for  all 
other  predictor  measures,  plus  other  criteria  that  may  emerge  as  a  result 
of  the  first  two  steps.  These  criteria  will  be  applied  against  written 
descriptions  of  the  perceptual /psychomotor  computer  measures,  derived  from 


the  first  two  activities  In  this  review  process.  Furthermore,  we  will 
Inquire,  during  the  above-mentioned  site  visits,  about  the  availability  of 
the  researchers  at  those  sites  for  participation  in  this  review  process. 
Several  members  of  Task  1  staff  possess  expertise  In  this  area  and  they 
will  participate  In  this  review  step  as  well. 


Finally,  a  literature  review  will  be  prepared  as  part  of  this  subtask.  It 
will  summarize  the  major  findings  from  our  literature  search  and  review, 
and  will  be  prepared  In  accordance  with  the  outline  submitted  to  ARI  as 
part  of  Subtask  2. 

Subtask  S:  Cost /Administrative  Practicality  Review 

Rationale.  At  this  point,  we  will  have  a  set  of  predictor  constructs  that, 
collectively,  are  the  best  possible  from  a  technical  point  of  view.  To 
prevent  wasting  time  and  money  on  validating  predictors  that  cannot 
ultimately  be  operationally  administered,  a  cost/administrative  practical¬ 
ity  review  must  be  undertaken. 


;V 


i 


§ 


£ 


Procedures.  Information  about  the  set  of  predictors  that  passed  technical  v«. 

review  will  be  prepared.  The  Information  will  Include  a  definition  of  the 
variable  or  construct;  content  description  or  Item  examples;  method,  time  u!: 

required,  and  costs  of  administration;  time,  costs,  and  nature  of  develop- 
ment  efforts;  and  any  other  Information  necessary  for  a  review  of  the  prac¬ 
ticality  of  a  predictor.  No  psychometric  or  technical  information  will  be 
Included,  since  the  predictors  will  already  have  passed  technical  review. 


2-44 


P 


A  cost/administrative  practicality  review  panel  will  t>e  selected.  Members 
should  have,  collectively  If  not  individually,  knowledge  of  the  field  oper¬ 
ation  of  recruitment,  selection,  and  classification  of  soldiers,  privacy 
concerns,  human  subjects  review  policies,  and  other  administrative  or  prac¬ 
tical  Issues  relevant  to  pre-induction  testing.  Preliminary  inquiries 
Indicate  that  appropriate  panel  members  can  be  Identified  and  are  willing 
to  serve  on  such  a  panel.  Some  members  of  this  panel  should  come  from  the 
support  officers  that  will  handle  troop  requests.  In  this  way,  the._ 
1  ■‘“ary  personnel  assisting  in  securing  troop  requests  will  have  an  early 
.u  .  iarlty  with  the  type  of  predictors  to  be  field-tested  in  later  subtask 
activities. 

Panel  members  will  receive  the  predictor  information  about  two  weeks  prior 
to  a  two-  to  three-day  workshop,  along  with  Instructions  and  forms  for 
recording  their  Initial  Judgments  about  the  predictors.  At  the  workshop, 
the  task  leader  and  team  leaders  will  work  with  the  panel  members  to  retain 
or  delete  predictors  based  on  estimates  of  cost  and  pratlcallty.  An 
attempt  will  be  made  to  strike  a  balance  between  the  estimated  predictive 
effectiveness  and  psychometric  adequacy  of  a  predictor  and  the  estimated 
practicality  and  cost  of  a  predictor's  development  and  operational  adminis¬ 
tration. 

Predictors  passing  this  review  will  move  forward  to  the  next  subtask. 
Note,  however,  the  predictors  not  passing  this  review  are  predictors  that 
possess  technical  merit  but  are  presently  too  costly  to  develop  further  In 


Project  A.  Conditions  that  led  to  these  cost  decisions  are  certainly  sub¬ 
ject  to  change.  Furthermore,  some  of  these  predictors  could  be  developed 
in  other  research  efforts.  The  point  Is  that  these  "rejected"  predictors 
actually  constitute  a  separate  research  agenda. 


Subtask  6:  Initial  Development  of  Predictors  for  the  Trie!  Batter 


Rationale.  The  predictors  surviving  technical  and  cost  reviews  will  re¬ 
quire  varying  degrees  of  further  development.  The  purpose  of  this  subtask 
Is  to  complete  the  Initial  developmental  steps,  try  out  the  predictors  on 
small  samples  of  soldiers,  and  revise  the  predictors.  As  described  In  Pro¬ 
cedures  for  this  subtask,  we  think  an  Iterative  process  of  several  tryouts 
and  revisions  will  be  the  most  efficient  way  to  complete  this  work. 


Procedures.  Two  different  types  of  development  efforts  will  be  required. 
One  effort  will  Involve  the  writing  and  revising  of  paper-and-pencll  cogni¬ 
tive  and  non-cognltlve  measures.  The  second  will  Involve  the  development 
of  computer-administered  versions  of  perceptual /psychomotor  measures  and 
computer-administered  versions  of  some  of  the  paper-and-pencll  measures. 
We  plan  to  begin  efforts  on  the  computer-administered  perceptual /psycho¬ 
motor  measures  Immediately  after  the  technical  review  process  for  those 
types  of  measures.  In  July,  1983.  Development  of  paper-and-pencll  measures 
will  begin  later,  after  the  cost/administrative  practicality  review  or 
December,  1983.  The  early  start  on  computer-administered  measures  is 
desirable  because  of  the  greater  amount  of  development  work  required.  It 
Is  also  possible  that  a  third  type  of  developmental  effort  will  be 


required.  This  effort  would  focus  on  the  development  of  tests  of  psycho¬ 
motor  abilities  using  apparatus  that  does  not  require  computers.  As  we 
have  already  stated,  we  do  not  think  It  advisable  to  test  psychomotor 
abilities  without  computer  administration.  Thus,  we  believe  It  Is  unlikely 
that  this  third  developmental  effort  will  be  required.  If  it  does  turn  out 
to"  be  required,  however,  this  effort  would  begin  at  approximately  the  same 
time  as  the  paper-and-pencli  measure  development  begins,  and  will  follow 

the  same  timetable  for  tryouts  and  revisions  as  that  effort. 

The  sequence  of  activities  In  the  development  of  computerized,  perceptual/ 
psychomotor  measures  will  be  as  follows: 

1.  Identifying  and  obtaining  the  appropriate  hardware  for  Initial  deve¬ 
lopment,  as  determined  by  the  technical  review.  A  minimum  amount  of 
hardware  will  be  obtained  for  these  Initial  efforts. 

2.  Writing  software  or  modifying  software  obtained  from  other  development- 
efforts  (Identified  In  the  field  visits  that  are  part  of  the  technical 
review  In  this  area)  to  measure  the  constructs  identified  In  the 
technical  review. 

3.  Trying  out  the  measures  on  a  small  sample  of  soldiers  (N-10) .  This 
will  occur  in  November,  1983,  and  the  try  out  could  take  place  at  a 
MEPS  site.  The  focus  Is  on  debugging  the  measures,  obtaining 


feedback  from  soldiers  on  the  acceptability  of  the  measures,  and 
getting  a  reading  on  the  administrative  problems  involved.  The  only 
facilities  required  will  be  a  classroom  with  furniture  appropriate  for 
setting  up  the  computer  hardware  (tables  and  chairs)  and  normal 
household  electrical  supplies. 


4.  Revising  the  software  and/or  hardware  In  light  of  the  Initial  try  out 
results. 

5.  Conducting  a  second  try  out  on  a  slightly  larger  sample  (N=*30).  This 
will  occur  In  January,  1984,  and  It  would  be  preferable  to  try  the 
measures  at  a  MEPS  site. 

6.  Writing  a  preliminary  report  on  the  results  of  the  Initial  development 
and  try  outs.  The  purpose  of  this  paper  Is  to  advise  ARI  of  results 
to  date  and  provide  a  judgment  about  further  developmental  work  and 
associated  hardware  costs.  This  report  will  be  submitted  by  March  1, 
1984,  in  order  to  provide  ARI  and  the  contractors  time  to  make  a 
judgment  about  the  costs  and  administrative  feasibility  of  Including 
these  measures  in  the  pilot  test  of  the  Trial  Battery  (Subtask  7). 

7.  Should  the  decision  be  to  go  ahead  with  these  measures,  then  further 
software  development  will  be  carried  out  and  arrangements  for  the 
necessary  hardware  to  carry  out  the  pilot  test  In  Subtask  7  would 
begin. 


& 


i.-' 


!►;> 

& 


2 

2-48  62  I 


8.  A  final  try  out  of  the  measures  would  be  tried  out  In  Hay,  1984,  on 
a  sample  of  forty  MEPS  candidates,  concurrent  with  the  try  out  of  the 
paper-and-pencll  measures  that  may  be  objectionable  In  some  way  and 
the  computer-administered  versions  of  some  of  the  paper-and-pencll 
measures.  (Reception  Stations  could  substitute  for  MEPS  In  3,  5,  and 
- 8) -  -  -  - -  - 

The  development  of  computer-administered  versions  of  some  of  the  paper- 
and-pencll  measures  would  begin  after  the  paper-and-pencll  versions  were 
Initially  developed,  approximately  February,  1984  and  these  measures  would 
be  tried  out  during  the  April,  1984  try  out  of  the  paper-and-pencll 
measures  (see  below).  Although  these  computerized  measures  are  concep¬ 
tually  distinct  from  the  computerized  psychomotor/perceptual  measures,  we 
think  their  Inclusion  In  the  pilot  test  is  linked  with  the  more  Important 
decisions  about  the  scope,  nature,  and  expected  payoff  of  using  the 
psychomotor/perceptual  measures  In  the  pilot  test,  and  In  a  major  sense  Is 
dependent  on  those  decisions. 

Finally,  we  Intend  to  keep  ourselves  fully  Informed  on  the  developments  In 
the  Army  and  other  services  with  regard  to  computer-assisted  testing  hard¬ 
ware  and  software.  These  developments  will  be  very  Important  to  the 
development  efforts  just  outlined  here  and,  especially,  to  the  ultimate 
feasibility  and  practicality  of  any  computerized  measures  under 
development. 

The  activities  devoted  to  the  Initial  development  of  paper-and-pencll 
measures  of  non-cognltl ve  and  cognitive  abilities  will  be  as  follows: 


2-49 


f 


1.  Preparation  of  specifications  for  measures.  We  will  have  accumulated 
a  great  deal  of  information  that  will  be  very  helpful  for  writing  or 
modifying  Items  to  be  Included  in  these  measures.  The  literature 
search  and  technical  review  will  provide  the  basic  information.  This 
Information  will  be  organized  by  the  Task  2  leader  and  domain  leaders 
Into  a  package  for  each  measure  that  will  contain  a  concrete,  specific 
definition  of  the  construct  to  be  measured,  examples  of  existing 
measures  of  the  construct,  specification  of  the  Item  format  and 
response  format,  desired  number  of  Items,  and  a  description  of  the 
administration  procedures  that  are  desired. 

2.  Task  2  staff  will  then  write  Items  for  the  measures  and  these  will  be 
reviewed  by  the  domain  leaders  and  the  task  leader  and  corrective 
feedback  will  be  given.  Our  expert  consultants  will  review  these 
measures  as  they  are  developed. 

3.  The  first  try  out  of  the  paper-and-pencl  1  measures  will  occur  In 
March,  1984,  about  14  weeks  after  development  begins.  A  total  of  60 
soldiers  (from  FORSCOM  or  TRADOC)  will  be  requested  for  one  day. 
There  are  no  special  MOS  requirements,  except  that  the  soldiers  should 
represent  a  diversity  of  jobs.  To  the  extent  possible,  both  sexes  and 
the  major  race  groups  should  be  represented.  As  noted  below,  however, 
no  significant  statistical  comparisons  of  group  performance  will  be 
carried  out  on  data  collected  during  the  try  out.  The  Intention  here 
is  to  avoid  trying  out  the  measures  on  a  homogeneous  group  of 
soldiers,  Instead  obtaining  data  from  a  heterogeneous  group.  These 
soldiers  will  be  broken  into  two  groups  of  30  soldiers  each.  They 


r 


i 

I 

s 


i'. 


b.  ; 


2-50 


win  attend  two  sessions,  one  In  the  morning  and  one  In  the  after¬ 
noon.  Soldiers  will  complete  different  sets  of  predictors  at  the  two 
sessions.  After  each  session,  they  will  complete  short  evaluation 
forms  about  the  predictors  and  will  be  asked  about  their  reactions  to 
the  measures. 


The  focus  of  these  try  outs  Is  on  "debugging,".  I.e.,  the  efficiency, 
practicality,  and  understandabll Ity  of  test  Items,  Instructions,  and 
administration  procedures.  Only  very  “simple  statistical  analysis, 
e.g.,  simple  frequency  contents  of  Items  completed,  will  be  conducted 
on  these  data  so  there  Is  no  need  for  larger  sample  sizes.  The  target 
group  size  of  30  Is  based  on  our  past  experience  with  developmental 
efforts.  This  size  of  group  provides  sufficient  diversity,  but  Is  not 
so  large  as  to  Inhibit  the  elicitation  of  direct  feedback  from 
participants  that  Is  necessary  at  this  stage  of  development. 

Two  normally  equipped  classrooms  will  be  required  to  conduct  the  try 
outs,  I.e.,  with  student  desks  or  tables  and  chairs. 

Information  from  this  try  out  will  be  used  to  revise  the  measures.  A 

second  try  out  will  be  conducted  approximately  one  month  after  the 

first  try  out.  This  try  out  will  be  conducted  similarly  to  the  first 
one.  Measures  will  again  be  revised. 

A  third  try  out  of  some  of  the  paper-and-pencll  measures  on  applicants 
at  a  M£PS  station  should  occur  approximately  one  month  after  the 

second  try  out.  The  purpose  of  this  try  out  Is  to  get  an  early 


reading  of  the  reactions  of  candidates  for  Army  EP  ranks  to  those 
measures  that  may  be  objectionable  In  some  way  (some  biographical  or 
personality  measures).  As  we  stated  above,  this  sample  will  also 
complete  the  computerized  measures.  We  think  a  sample  of  40  can¬ 
didates  for  four  hours  should  be  sufficient.  If  possible,  the  MEPS 
station  selected  should  be  typical,  in  the  sense  that  It  should  not 
have  an  atypical  population  of  candidates  being  processed.  The  40 
candidates  should  be  heterogeneous  with  regard  to  race  and  sex  back¬ 
ground.  No  major- statistical  analyses  of  their  data  will  be  conducted 
just  as  In  the  other  tryouts. 

All  measures,  computerized  and  paper-and-pencll ,  will  be  given  final  re¬ 
visions  based  on  this  last  tryout,  In  preparation  for  Subtask  7,  the  pilot 
test. 

Subtask  7:  Pilot  Tests  of  Predictors  for  the  Trial  Battery  (New 

Predictors)  in  the  Field 

Rationale.  The  pilot  testing  subtask  Is  designed  to  answer  three  Important 
questions:  (1)  Do  the  newly  developed  predictor  measures  work  admin¬ 
istratively  for  fairly  large  samples?  (And,  related  to  this,  how  are  the 
measures  received  by  soldiers?)  (2)  What  are  the  Item  and  test  character¬ 
istics  of  the  new  measures,  l.e..  Item  response  frequencies;  test,  scale, 
and  Item  reliabilities;  stabilities  or  test-retest  reliability  test  score 
distributions?  (3)  How  do  the  new  tests  covary  with  each  other,  with  cur¬ 
rent  pre-induction  measures,  and  available  criterion  Information? 


Io  addition,  fakeabllity  of  responses  will  be  of  some  concern  for  bio  data 
and  vocational  Interest  measures.  Research  addressing  this  concern  Is 
appropriate  for  this  stage  of  the  effort.  One  sample  of  soldiers  will  be 
asked  to  answer  these  Items  to  make  themselves  "look  as  good  (or 
well -qualified)  as  possible"  and  another  to  "look  as  bad  (or 
least-qualified)  as  possible."  A  posslt  arlatlon  on  the  Instructional 
set  will  be  to  ask  soldiers,  to  "answer _so  that  you  would  be  chosen  for. a 
job  In  the  electronics  field"  or  some  other  target  field. 

A  setjnd  examination  of  fakeabllity  will  require  administration  of  the 
blo/vocatlonal  Interest  predictors  to  a  group  or  groups  of  candidates  at 
NEPS  sites.  These  persons  will  presumably  be  Interested  In  doing  as  well 
as  possible,  even  though  they  will,  of  course,  need  to  be  debriefed  after 
taking  the  battery  so  that  they  understand  that  It  had  no  impact  on  actual 
selection  or  classification  decisions  for  thorn. 

One  other  special  effort  will  be  required.  Practice  effects,  beyond  the 
normal  test-retest  phenomena,  may  be  of  concern  for  some  of  the 
perceptual /psychomotor  tests.  Therefore,  several  test  opportunities  In 
t,u1ck  succession  should  be  given  to  a  sample  of  subjects.  It  Is  possible 
that  practice  is  a  necessary  part  of  the  administration  protocol  for  tests 
of  this  type;  thus,  It  will  be  necessary  to  evaluate  such  practice  effects 
to  oete-mlne  when  the  most  stable  between-subjects  measure  of  psychomotor 
abilities  may  be  obtained. 

Payability  and  practice  effects  are  only  two  factors  that  can  have  unde¬ 
sirable  (or  unknown)  effects  on  test  scores.  Others  are  order  effects  such 


as  fatigue,  and  context  and  situation  effects  such  as  lighting,  adminis¬ 
trator,  amount  of  work  space,  etc.  We  have  Isolated  practice  and  fake- 
abllicy  (or  deception  and  malingering)  as  two  Important  extraneous  factors 
to  be  examined  early  In  the  research.  In  general,  our  approach  to  these 
extraneous  factors  will  be  to  exert  -direct  control  whenever- possible,  and- 
to  conduct  appropriate  research  to  estimate  the  magnitude  of  such  effects 
If  direct  control  Is  not  possible.  Situational  effects  can  usually  be 

directly  controlled  by  specifying  the  physical  conditions  of  test  admin¬ 
istration  and  insuring  adequate  training  for  all  administrators.  Order 
effects  can  be  estimated  or  controlled.  That  Is,  all  subjects  can  be  given 
tests  In  the  same  order  which  would  be  adequate  unless  there  Is  an  Inter¬ 
action  between  subjects  and  order.  The  only  way  to  estimate  such  Inter¬ 
action  effects  or  certain  other  effects  such  as  deception  or  deliberate 

bias  Is  to  carry  out  research,  similar  to  that  outlined  for  fakeablllty  and 
practice  effects.  (We  believe  the  proper  place  for  examination  of  order 
effects  Is  later  In  the  project  when  the  final,  smaller  version  of  the  new 
predictor  battery  Is  being  put  together.  This  later  version  will  contain 
tests  more  nearly  like  those  that  would  be  operationalized  and  the  order 
Information  would  be  most  useful  then). 

All  research  on  the  effects  of  extraneous  factors  are  concerned  with  the 
reliability,  or  more  accurately,  the  general Izabl 1 ity  of  the  measurement 
process  (Cronbach,  Gleser,  Nada,  &  Rajaratnam,  1972).  (This  concept  Is  not 
the  same  as  the  generalization  of  validity  which  Is  concerned  with  the 
degree  to  which  a  test's  relationships  with  other  variables  are  the  same 

across  extraneous  factors;  rather,  this  use  refers  to  the  extent  that 


persons'  test  scores  will  be  the  same  across  extraneous  factors.)  Ideally, 
all  extraneous  factors  would  have  little  or  no  effect  on  a  person's  test 
score,  but  these  effects  must  be  examined  to  determine  If  that  Is  the  case, 
and  If  It  Is  not,  to  determine  what  procedures  can  be  brought  to  bear  to 
reduce  the  Impact  of  such  effects,  (It  is  possible,  also*  as  mentioned 
previously,  that  practice  on  performance  measures  Is  an  Important  part  of 
the  administration  protocol;  that  Is,  scores  on  later,  more  experienced 
trials  may  be  better  In  predicting  targeted  criterion  outcomes.) 

We  plan  to  employ  the  general Izabll Ity  theory  approach  to  the  study  of  pre- 
dlctor  reliability,  as  outlined  In  Cronbach,  et  al .  (1972).  This  approach 
calls  upon  the  researcher  to  clearly  define  the  universe  of  generallz* 
ability,  that  Is,  to  Identify  the  facets  that  he  wishes  to  generalize  over 
(such  as  practice,  order,  administrator,  or  situation),  and  then  to  devise 
research  designs  to  Include  those  facets.  ANOVA  Is  then  used  to  estimate 
variance  components  associated  with  each  facet,  enabling  the  Investigator 
to  determine  the  limits  of  general izabll ity  of  test  scores  and  appropriate 
directions  to  proceed  In  order  to  Increase  the  reliability  or  generallz- 
ability  of  the  measurement  process.  Even  If  the  actual  ANOVA  computations 
and  variance  component  estimates  zre  not  completed,  the  analytic  exercise 
of  identifying  all  facets  and  thinking  about  their  possible  effects  aid  In 
defining  samples  and  setting  up  test  administration  procedures. 

The  pilot  test  also  affords  the  opportunity  to  look  at  the  relationships  of 
the  experimental  predictors  to  current  pre-induction  measures,  training 
school  performance,  and  job  performance  provided  those  scores  are  available 


for  the  soldiers  who  participate  In  the  pilot  test.  Use  of  soldiers  from 
the  FV81/-fi  cofccrt  would  provide  the  best  opportunity  for  these  analyses. 
These  soldiers  will  have  on  file  scores  on  their  pre-induction  measures 
and,  perhaps,  training  school  scores  and  some  Army-wide  measures.  Task 
leaders _frep  Tafhs  I,  3,  and  4  w111_be  consulted  about  obtaining  such- data 
for  the  selected  temple. 

In  addition  to  providing  data  necessary  to  refine  the  experimental  pre¬ 
dictor  battery,  these  pilot  tests  will  also  provide  a  "shakedown"  for  the 
administrative  procedures,  coordination,  and  communication  of  the  research 
teams,  providing  valuable  Information  for  the  much  larger  cohort  adminis¬ 
tration  that  will  occur  after  revisions  to  the  battery. 

Procedures.  Before  detailing  procedures  at  the  pilot  test  sites,  we  will 
say  a  few  words  about  sample  sizes.  There  are  essentially  two  types  of 
concerns  about  sample  size  for  the  pilot  test.  First,  we  wish  to  have  suf¬ 
ficient  sample  size  to  obtain  stable  estimates  of  covariance  between  pre¬ 
dictors.  For  example,  N-480  Is  sufficient  to  detect  a  correlation  at  .09 
as  being  significantly  different  from  zero  at  the  .05  alpha  level  (Walker  4 
Lev,  1953,  p.  252)  and  provides  a  95  percent  confidence  Interval  of  +  .09 
around  Fisher's  z  transformation  of  r.  For  a  sample  value  of  .65,  the  con¬ 
fidence  Interval  would  cover  the  range  of  r's  from  .60  tc  .70.  This  degree 
of  precision  In  estimating  predictor  covariances  Is  sufficient  for  this 
stage  of  the  research.  Second,  as  we  noted  In  the  Rationale  section  for 
this  subtask,  we  wish  to  carry  out  studies  of  fakeablllty  and  practice 
effects.  I r.  the  main,  the  analysis  of  these  effects  will  consist  of  tests 
of  significance  of  mean  differences  between  groups  or  simple  ANOVA’s. 


3 

'.'V 


f 


it 


I 


r * 

L 


Precise  estimates  of  necessary  sample  sizes  for  such  analyses  cannot  be 
made  at  this  time  using  the  usual  power  formulas  (Walker  4  Lev.  1953) 
because  some  of  the  parameters  needed  to  make  the  estimates  are  unknown. 
(The  variances  of  the  newly  developed  experimental  tests  cannot,  of  course, 
be  even  crudely  estimated  nor,  likewise,  can  the  size  of  practically 
significant  score  differences  between  groups  on  the  experimental  tests  be 
estimated.)  However,  we  can  make  realistic  assumptions  about  these 

parameters  now  In  order  to  make  the  best  possible  estimates  of  samples 
required.  If  we  wish  to  detect  mean  score  differences  between  groups  of 

Interest  (say,  a  "regular"  group  and  a  "fake  good"  group)  on  the  order  of 

.25  standard  deviations  with  an  alpha  (probability  of  Type  I  error)  level 
of  .05  and  a  beta  (probability  of  Type  II  error)  of  .30,  then  we  can 

compute  the  size  of  the  required  sample  according  to  a  derivation  of  the 
formulas  given  by  Walker  4  Lev  (1953,  p.  166). 

N  -  £(1.414)  a/d  (Za  +  Zb)  J  2 

where  N  ■  sample  size 
o  *  standard  deviation  of  the  measure 
d  ■  size  of  score  difference  desired  to  detect 
Za  ■  Z  score  for  specified  level  of  alpha 
Zb  ■  Z  score  for  specified  level  of  beta 

Substituting  the  values  for  our  case,  the  computation  Is: 

N  -  £l  .414  (4.00)  (1.645  +  .253)  j  2 

N  ■  115.24 


.  t 

\ 


■2^1- • 


Therefore,  sample  sizes  of  115  for  the  special  studies  of  practice  and 
fakeablllty  will  enable  us  to  detect  real  differences  as  small  as  one- 
quarter  standard  deviation  with  a  probability  of  .70  (1-beta)  and  with  the 
alpha  level  at  .05,  l.e.,  the  probability  that  we  will  decide  there  Is  a 
difference  In-  scores  between  the  two  groups  when  there  Is  no  real 
difference.  As  the  formula  shows,  decreases  In  level  of  alpha  and  beta  or 
In  the  size  of  the  score  difference  one  wishes  to  detect  will  all  result  In 
Increases  In  the  required  sample  size.  At  this  point,  we  think  the  levels 
stated  above  are  adequate  for  pilot  test  purposes.  With  regard  to  the 
fakeablllty  study,  we  will  have  developed  a  priori  scales  Intended  to 
detect  faking  and  we  think  a  sensitivity  to  mean  differences  on  the  order 
of  .25  SO  Is  sufficient  for  such  scales  as  well  as  for  scales  measuring 
predictor  constructs.  The  percentage  of  overlap  between  two  distributions 
with  a  .25  SD  mean  difference  Is  90  percent  (Dunnette,  1966,  p.  143). 
(Samples  of  115  would  not  be  sufficient  for  doing  empirical  keying  to 
detect  faking,  but  as  just  stated,  that  Is  not  Intended  at  this  point.)  We 
are  less  certain  about  the  sufficiency  of  an  N  of  115  for  an  Investigation 
of  practice  effects,  but  are  not  now  In  possession  of  the  necessary 
Information  to  make  a  more  precise  estimate. 

The  Implication  of  these  estimates  of  sample  size  requirements  for  pilot 
test  data  collection  are  these: 

1)  An  N  of  480  for  estimating  correlations  between 
predictors. 

2)  An  N  of  115  for  the  special  studies  of  practice  and 
fakeabl 1 ity. 


We  propose  the  following  method  of  obtaining  the  pilot  test  data.  The  data 
would  be  collected  at  one  post.  If  at  all  possible,  on  the  FY81/82  cohort. 
One  post  Is  desired  because  In  the  event  that  physical /psychomotor  test 
apparatus  and  micro  computers  will  be  required  for  administration,  it  will 
be  extremely  costly  to  have- dupl  icate  sets  of  these,  and  they -are  not - 
easily  transported.  Furthermore,  the  time  frame  for  pilot  test  adminis¬ 
tration  does  not  appear  to  provide  sufficient  time  to  travel  to  several 
posts  In  succession.  The  FY81/82  cohort  Is  desirable  because  these 
soldiers  will  be  In  their  first  tour  but  through  training  and  will  provide 
the  opportunity  to  obtain  scores  on  current  pre-induction  measures  and  In 
training  for  comparison  and  analysis  with  the  experimental  predictors. 

The  data  collection  method  requires  four  distinct  episodes  of  data 
collection.  The  first  episode  Is  the  collection  of  the  main  body  of  data 
on  a  sample  of  480  soldiers  over  a  two-week  period.  As  we  now  see  It,  each 
soldier  will  complete  all  predictor  measures  over  a  two-dqy  period  with  the 
two  days  separated  by  one  week.  For  purposes  of  this  plan,  we  are  assuming 
that  earlier  research  (In  Sub task  6)  has  Indicated  the  desirability  of 
pilot  testing  computer-administered  measures  and  that  sufficient  hardware 
resources  have  been  obtained.  Furthermore,  we  are  also  assuming  that  some 
form  of  physical  ability  testing  apparatus  will  be  pilot  tested.  If  either 
or  both  of  these  events  do  not  occur,  then  the  procedures  outlined  here 
will  be  much  less  complex  and  demand  less  In  the  way  of  soldier  time. 
Given  these  assumptions,  then,  there  will  be  three  testing  sessions: 


a  "paper-and-pencll"  session  In  which  the  non-cognltl ve  measures  of 
vocational  Interests,  bio-data  and  cognitive  ability  will  be  given. 
This  session  should  require  four  hours,  but  may  require  six. 
Forty-eight  soldiers  would  attend  each  session. 


a  "computer"  session  In  which  the  computer  administered  measures  of 
perceptual /psychomotor  abilities  and  the  alternative  computerized 
measures  of  some  of  the  paper-and-pencll  battery  will  be  given.  Two 
groups  of  12  soldiers  would  constitute  a  "session."  At  this  time,  we 
think  two  sessions  of  24  per  session  can  run  each  day,  given  the 
availability  of  twelve  micro-processors. 


t  i 


n 

n 


an  "apparatus"  session  in  which  the  perceptual /psychomotor  tests  will 
be  administered.  The  session  Is  so  named  because  there  will  be  some 
apparatus  Involved  as  part  of  some  of  the  tests.  We  plan  to  process 
24  soldiers  through  each  session  In  about  three  hours. 


Each  soldier  will  complete  the  "paper-and-pencll"  session  In  one  day,  and 
the  "computer"  and  "apparatus"  sessions  (one  In  the  A.M.  and  one  In  the 
P.M.)  on  the  second  day.  Using  this  method,  480  soldiers  will  complete  all 
predictors  over  a  two-week  period  with  each  soldier  away  from  normal  duties 
for  1-1/2  to  2  days.  Ideally,  these  480  soldiers  will  be  made  up  of  a 
variety  of  MOS. 


P. 


t\  t1 

S  & 


es  I 


8 


IZ' T» 


u 


The  second  episode  will  be  the  administration  of  the  measures  for  which 
practice  effects  are  a  concern.  This  will  primarily  be  the  "apparatus"  and 
"computer"  sessions,  though  not  ell  measures  In  these  two  sessions  may  be 
Involved.  This  episode  will  require  115  soldiers  not  Included  in  the 
original  480,  and  will  require  one  d*y  from  each  soldier.  It  will  not  be 
possible  to  take  race  and  sex  differences  into  account  here,  but  It  would 
be  best  to  have  an  approximately  equal  number  of  males  and  females  to  make 
up  this  group.  This  episode  will  take  place  In  the  week  following  the 
first  episode  on  the  same  post,  for  the  reasons  of  practicality  and  economy 
already  cited. 

The  third  episode  takes  place  at  a  MEPS  site.  One  hundred  and  fifteen 
applicants  will  complete  the  parts  of  the  paper-and-pencll  battery  that  are 
of  concern  with  regard  to  fakeablllty  or  motivational  set  (the  non- 
cognltlve  measures).  We  will  administer  some  of  the  "computer"  measures  In 
order  to  gauge  their  practicality  for  MEPS  and  acceptability  to  appli¬ 
cants.  This  group  should  also  be  equally  split  between  males  and  females. 
If  possible.  (We  are  assuming  no  race  or  sex  Interaction  with  motivation. 
Sample  sizes  must  be  multiplied  by  at  least  four  if  this  assumption  Is  not 
made.  At  present  we  think  this  assumption  Is  defensible.) 

The  fourth  episode  Is  the  collection  of  test-retest  and  fakeablllty  data. 
The  test-retest  will  be  collected,  necessarily,  at  the  same  post  as  the 
first  episode.  The  original  sample  of  480  will  be  asked  to  complete  one  of 
the  three  sessions  a  second  time,  or  a  half-day  for  each  soldier.  This 
yields  a  sample  of  N»160  to  compute  stability  coefficients  for  each 
measure.  The  soldiers  will  be  scheduled,  If  possible,  so  that  race  and  sex 

2-61 


composition  Is  balanced  across  the  three  sessions.  An  additional*  separate 
sample  of  230  soldiers  will  be  required  to  complete  the  measures  for  which 
fakeablllty  Is  a  concern  (the  non-cognltlve  parts  of  the  paper-and-pencll 
battery).  One  group  will  be  Instructed  to  "fake  good"  and  one  group  to 
"fake  bad."  These  groups  should.  If  possible,  be  equally  split  between 
males  and  females.  A  half-day  of  each  soldier's  time  will  be  required. 


Task  2  staff  will  administer  all  measures,  but  we  will  require  the  assis¬ 
tance  of  one  to  three  Anqy  personnel  to  proctor  sessions  for  the  main  data 
collection  (episode  one).  Facilities  required  for  the  main  data  collection 
are  three  classrooms.  One  with  student  desks  or  tables  and  chairs  for  50 
persons,  one  cleared  for  placement  of  apparatus,  and  one  classroom  with 
tables  and  chairs  with  sufficient  electrical  outlets  for  12  micro¬ 
processors.  The  second  episode  requires  two  classrooms,  one  for  the 
computers  (electric  outlets)  and  one  for  apparatus. 


The  third  episode,  at  MEPS,  requires  one  large  classroom  with  desks  and  one 
room  with  tables  and  chairs  and  outlets  for  the  computers.  The  fourth 
episode  requires  the  same  facilities  as  the  first  episode. 


Finally,  to  collect  data  on  the  clarity  and  acceptability  of  measures,  test 
administration  staff  will  administer  brief  feedback  forms  to  the  subjects 
after  each  testing  session,  as  appropriate,  as  well  as  conduct  post-session 
interviews  with  random  samples  of  the  subjects  (about  5-10  percent).  The 
feedback  forms  and  Interviews  will  focus  on  the  clarity  of  test  items, 
formats,  and  Instructions;  perceived  "validity"  and  "fairness"  of  the 
tests;  objectionable  Items,  etc.  Each  team  will  write  a  report  outlining 


tS 


f 


5s 

& 


I 

I 

s 


fj 


Its  finding  from  these  Investigations.  We  have  Inquired  at  FORSCOM  about 
the  feasibility  of  the  activities  outlined  above  and  have  been  Informed 
that  they  could  be  accomplished. 

! 

i 

Subtask  8:  Analysis  of  Trial  Battery  Pilot  Test  Data 

Rationale.  This  subtask  Involves  the  performance  of  the  analyses  of  the 
pilot  test  data.  Therefore,  these  analyses  are  designed  to  answer  the 
three  primary  questions  of  concern  that  we  outlined  In  the  Rationale  for 

i 

Subtask  7  above.  In  the  Interest  of  brevity  and  to  avoid  redundancy,  we 
refer  the  reader  to  that  section. 

Procedures.  Analyses  of  the  predictor  responses  will  Include  item 
analyses;  test  score  distributions  (frequency  distributions,  mean,  standard 
deviations,  skew,  kurtosls,  etc.);  Internal  consistency  and  test-retest 
reliabilities.  Item  and  test  score  differences  for  the  major  ethnic/sex 
groups  will  be  examined.  Item  factors  analyses  may  be  performed  for 

predictors  that  have  unknown  or  ambiguous  factors  structures.  Correlations 
between  predictor  measures,  Including  ASVAB  scores,  will  be  computed  and  | 

factor  analyses  performed  as  appropriate.  As  noted  earlier,  in  Subtask  4, 
the  Inclusion  of  "marker"  variables  In  the  Trial  Battery  will  make  factor 
analyses  more  easily  Interpretable  and  allow  a  better  understanding  of 
newly  developed  measures.  All  these  results  will  be  used  to  Identify 

deficiencies  In  Items  and  tests  such  as  poor  score  distributions,  low  rell-  ^ 

abilities,  redundancy  In  the  battery,  and  race  or  sex  differences  In  Item 
and  score  distributions.  (We  should  note  that  we  have  been  Informed  that 


L 


2-63 


~-T£~ 


It  Is  not  possible  to  request  specific  numbers  by  race  and  sex,  so  some  of 
these  analyses  may  not  be  possible  at  this  stage  of  the  research.) 

We  will  assess  the  "fakeabl 1 ity"  and  “practice"  effects  by  comparing  score 
distributions  between  the  experimental  and  control  groups,  l.e.,  the  “fake 
good"  versus  “fake  bad"  versus  “regular"  samples  and  the  “several  trials" 
versus  "single  trial"  groups.  T-tests  and  analysis  of  variance  or  analysis 
of  covariance,  If  appropriate,  will  be  the  analytic  method.  Suspect  tests 
and  scales  will  be  Identified  through  these  analyses  and  scrutinized  for 
Improvement  and  deletion.  In  addition,  special  scales  will  have  been 
constructed  to  detect  such  response  biases,  and  they  will  be  evaluated  to 
determine  whether  or  not  they  are  Indeed  performing  that  function. 

Finally,  relationships  of  the  Trial  Battery  measures  to  available  criterion 
Information  such  as  the  soldiers'  performance  In  training  and  performance 
ratings  will  be  analyzed. 

A  primary  objective  of  these  analyses  Is  the  Identification  of  redundancy 
In  the  Trial  Battery,  both  within  the  battery  itself  and  between  the 
Trial  Battery  and  current  measures.  This  Information  Is  of  particular 
Interest  for  guiding  revisions  to  the  Trial  Battery  (which  takes  place  In 
Subtask  10).  Assuming  computer-assisted  measures  were  administered  In 
Subtask  7  (recall  that  Subtask  6  Includes  a  decision  point  about  the 
Inclusion  of  such  measures  In  the  pilot  test),  then  we  will  also  prepare  a 
special  report  about  the  Trial  Battery  experience  with  these  measures, 


especially  the  perceptual /psychomotor  computerized  measures.  This  report 


will  also  address  the  cost  Implications  of  further  development  efforts, 
especially  costs  associated  with  large-scale  deployment  and  utilization  of 
computer  hardware  that  would  be  necessary  In  Subtask  11,  administration  of 
the  revised  trial  battery.  The  availability  of  sufficient  hardware, 
through  Army  or  contractor  resources  will,  no  doubt,  play  a  major  role  In 
decisions  made  about  these  measures  at  this  point  In  the  project. 

As  we  noted  In  the  overview  section.  Task  2  staff  will  have  primary  respon¬ 
sibility  for  conducting  the  analyses  outlined  here.  All  the  data  will  be 
added  to  the  LROB  by  Task  1  staff,  and  they  will  consult  with  Task  2  staff 
on  the  conduct  of  the  analyses  and  provide  appropriately  constructed  data 
files  for  the  analyses. 

Subtask  9;  Analyze  Preliminary  Battery:  FY83/84  Cohort  School  and 
Preliminary  Battery  Data 

Rationale.  The  purpose  of  this  subtask  Is  to  analyze  the  relationships 
between  the  measures  on  the  Preliminary  8attery,  current  pre-induction 
predictors,  and  training  school  performance.  The  results  of  these  analyses 
are  Input  to  Subtask  10  to  guide  revisions  to  the  Experimental  Battery. 
Also,  we  will  perform  analyses  on  data  collected  during  the  first  two  or 
three  months  of  Subtask  3  with  the  Preliminary  Battery  In  order  to  provide 
guidance  for  development  of  measures  In  Subtask  6. 


Procedures.  Approximately  13,500  soldiers,  an  average  of  3,350  from  each 
of  four  MOS,  will  have  completed  the  Preliminary  Battery  (see  Table  3,  page 
19).  Also  available  will  be  measures  of  their  performance  In  training 
(from  Task  3)  and  their  scores  on  current  pre-induction  measures  (from  Task 
1).  The  analyses  will  be  of  four  major  types. 

1.  Covariances  of  Preliminary  Battery  measures  --  correlation  matrices 
and  factor  analyses  will  be  completed  to  identify  redundancies  with 
the  battery  itself.  As  noted  above,  these  analyses  will  be  performed 
on  part  of  the  sample  to  provide  guidance  for  Subtask  6. 

2.  Covariances  of  Preliminary  Battery  measures  and  current  pre-induction 
measures  --  correlation  matrices  and  factor  analyses  of  the  two  sets 
of  measures  will  be  completed  to  identify  redundancies  across  the  two 
sets  of  measures.  Again,  early  analyses  on  part  of  the  sample  will  be 
performed. 

3.  Prediction  of  training  school  performance— bivariate  correlations  be¬ 
tween  training  measures  and  Preliminary  Battery  measures,  and  between 
training  and  current  pre-induction  measures  will  be  completed. 
Multiple  regressions  Including  current  and  Preliminary  Battery 
measures  will  be  completed  to  Identify  the  amount  of  Incremental 
validity  contributed  by  Preliminary  Battery  measures. 

4.  Classification  analyses  --  the  sample  Is  composed  of  soldiers  from 

four  MOS  of  relatively  different  occupational  types:  Radio  H 


Operator  (05C),  Admin  Specialist  (71L) ,  Vehicle  and  Generator  Mechanic 
(63B),  and  Tank  Crewman  (19E/X),  fh-is  provides  the  opportunity,  at  a 
minimum,  to  examine  score  differences  between  MOS  groups  on  the 
Preliminary  Pattery  measures  in  order  to  examine  their  value  for 
classification  purposes.  More  sophisticated  analyses,  such  as 
multiple  discriminate  functions  analyses  or  centour-  analyses  are 
certainly  possible  given  the  sample  sizes.  However,  we  must  keep  in 
mind  that  these  classification  or  group  memberships  have  not  been  made 
optimally,  so  caution  must  be  exercised.  An  interesting  possibility, 
however,  would  be  to  identify  ‘'outliers"  within  these  groups  and 
examine  their  store  profiles.  These  outliers  will  be  particularly 
interesting  to  follow-up  in  terns  of  tenure  and  job  performance  as 
those  data  become  available  (In  Subtask  12). 

The  overall  goal  of  these  analyses  is  to  Identify  measures  or  constructs  in 
the  Preliminary  Battery  that  are  efficient  (correlate  the  least  with  other 
pre-induction  measures),  effective  (In  this  case,  predict  training  perfor¬ 
mance),  and  provide  Incremental  validity  (beyond  that  produced  by  the 
A$VAB)„ 

Task  2  staff  will  have  primary  responsibility  for  all  the  analyses  outlined 
above,  but  Task  1  will  have  major  Input  to  the  analyses  outlined  Sn  points 
3  and  4  above.  Task  1  staff  will  add  these  data  to  the  LRDB  and  will 
provide  appropriate  data  files  for  analyses.  This  latter  point  will  be 
especially  crucial  for  the  early  analyses  of  the  Prel iml nary  Battery. 


Subtask  10:  Prepare  Revised  Trill  Battery  for  FY83/84  Cohort 
Predict or /Performance  Data  Collection 

Rationale.  In  this  subtask.  Information  from  the  Preliminary  Battery 
analyses  and  analyses  of  the  pilot  test  of  the  first  version  of  the  Trial 
Battery  will  be  Integrated.  This  Is  essential  for  guiding  revisions  to  the 
Trial  Battery.  Other  steps  in  this  subtask  are  obviously  necessary,  that 
Is,  actually  revising  the  Trial  Battery,  preparing  administration 
materials,  and  training  test  administrators. 

Procedures.  Analyses  from  Subtasks  8  and  9  will  be  used  to, guide-  (^visions 
to  the  Trial  Battery  so  that  we  have  the  most  efficient,  effective  and 

practically  feasible  set  of  measures.  Task  2  staff  will  carefully  examine 

...  ’  •  '  •  -  \ 

the  results  of  these  analyses  and  prepare  a  revision  pl«in.  This  plan  will 
be  submitted  to  ARI  for  review.  We  anticipate  extensive, interaction,  l.e., 
meetings  In  Minneapolis  and/cr  Washington,  will  bi  required  to  make  final 
decisions  about  the  revisions  to  be  made. 

The  Trial  Battery  measures  will  then  be  revised  In  acebrd  witi»  these 
plans.  We  think  the  objective  should  be  a  Trial  Battery  that  requires  a 
maximum  of  four  hours  to  administer.  This  should  make  It  feasible  to 
administer  all  the  trial  predictors  to  each  member  of  the  PY83/84  cohort 
that  will  be  tested  on  the  next  subtask.  This  Is  desirable  in  order  to 
have  complete  data  on  each  subject  for  a,.alyt1c  purpose*  and  to  simplify 
the  data  collection  procedures,  which  must  Include  the  collection  of  job 
performance  criteria  as  well  as  the  Trial  Battery  measures. 


It  may  be  necessary  to  hold  some  very  small  sample  try  outs  of  revised 
measures  after  revisions  have  been  made  to  be  sure  that  administration  pro¬ 
cedures  and  time  estimates  are  all  in  order.  Our  best  estimate  Is  that 
three  groups  of  25  soldiers  for  one-half  day  each  would  be  the  maximum 
requirements.  Final  administration  procedures  will  then  be  designed. 
Testing  materials  In  the  quantity  necessary  will  be  printed  or  procured. 
Administration  manuals  with  detailed  instructions  will  be  written. 

Data  collection  teams  made  up  of  staff  from  Task  5  will  be  responsible  for 
collection  of  data  from  the  FY83/84  cohort.  Including  the  Trial  Battery 
data.  Task  2  staff  will  be  responsible  for  providing  the  testing  mater¬ 
ials,  detailed  administration  procedures,  and  a  training  session  for  test 
administration.  We  will  schedule  the  training  session  for  the  most  con¬ 
venient  site,  probably  Washington,  and  train  all  members  of  Task  5  staff 
that  will  perform  Trial  Battery  data  collection. 

An  important  part  of  this  training  and  the  content  of  administration  man¬ 
uals  will  Include  procedures  to  follow,  given  the  occurrence  of  anticipated 
problems.  The  pilot  test,  conducted  by  Task  2  staff,  should  provide  suf¬ 
ficient  Information  to  prepare  such  contingency  plans  for  almost  all  data 
collection  problems. 

Subtask  11:  Monitor /Ass 1st  Administration  of  Revised  Trial  Battery  to 
FY83/B4  Cohort 

Rationale.  At  this  point,  prior  subtasks  have  resulted  in  the  development 
of  a  Trial  8attery  of  new  pre-induction  predictors,  materials  have  been 


2-69 


prepared,  and  the  Task  5  staff  responsible  for  administration  of  the  bat¬ 
tery  have  been  trained.  Task  2  staff,  however,  will  monitor  the  admin¬ 
istration  process  and  be  prepared  to  offer  assistance  In  overcoming  admin¬ 
istration  problems.  The  Trial  Battery  will  be  administered  to  about  500 
soldiers  In  each  of  19  MOS  (for  which  such  numbers  are  ava11ab1e--see  Table 
3),  and  job  performance  criterion  data  will  be  .collected  for.  these  same 
soldiers.  This  provides  a  data  set  for  a  concurrent  validity  study  of  the 
Trial  Battery.  To  the  extent  possible,  this  sample  of  soldiers  should  be 
stratified  on  race  and  sex  within  MOS.  (Some  MOS  will  have  no  females.) 
This  stratification  Is  necessary  In  order  to  carry  out  studies  of  test 
fairness.  Also,  this  sample  will  Include  as  many  soldiers  as  possible  that 
completed  the  Preliminary  Battery  In  Subtask  3  during  their  AIT.  This  will 
allow  a  predictive  validity  Investigation  of  the  Preliminary  Battery. 

In  addition  to  the  collection  of  these  primary  data,  a  number  of  experi¬ 
mental  projects  will  require  the  collection  of  data  on  some  smaller 
samples.  This  research  will  focus  primarily  on  the  extraneous  factors  that 
might  effect  the  general Izabll Ity  of  the  measurement  process,  that  Is, 
practice  effects,  subject  condition  effects,  faking,  etc.,  and  will  be  very 
similar  to  the  research  carried  out  In  the  pilot  test.  Indeed,  the  exact 
nature  of  these  projects  depends  very  much  on  the  outcome  of  the  research 
on  motivation,  practice  effects,  and  fakeablllty  conducted  In  Subtask  8. 
For  present  estimation  purposes,  we  assume  that  four  research  projects 
will  be  required:  test-retest  reliability  (stability)  of  measures,  practice 
effects  for  a  selected  subset  of  measures,  fakeablllty,  and  the  differences 
in  scores  achieved  by  the  primary  body  of  soidlers  and  scores 


achieved  by  soldiers  at  an  early  point  In  their  career,  l.e.,  at  AIT.  For 
the  stability  research  project,  a  sample  of  500  will  yield  stability 
coefficients  with  a  standard  error  of  about  .04,  which  is  sufficiently 
precise.  These  soldiers  will  be  a  sub-sample  of  the  primary  sample,  and 
will  complete  the  predictors  of  Interest  30  days  after  their  first 
completion.  For  the  research  on  practice  effects  and  fakeabillty,  separate 
samples  (other  than  the  primary  sample)  will  be  required.  If  we  make  the 
same  assumptions  as  outlined  in  Subtask  8,  three  samples  of  115  soldiers 
will  be  required:  one  for  practice  effects,  one  for  fake  "good,"  and  one 
for  fake  "bad."  If  different  assumptions  are  made,  and  the  outcome  of  the 
pilot  test  may  dictate  such  assumptions,  then  these  estimates  will  change. 
One  change  that  Is  perhaps  more  likely  than  others  would  be  an  examination 
of  the  Interaction  of  sex  and/or  race  with  practice  or  fakeabillty.  If 
this  does  occur,  then  the  required  sample  size  will  be  much  larger,  l.e., 
If  four  groups  (black  and  white  females  and  males)  are  of  interest,  then 
the  required  sample  size  would  be  320.  The  fourth  Investigation  of  score 
differences  between  "early  career"  soldiers  and  the  primary  sample  (later 
career  soldiers)  must  be  of  sufficient  size  to  provide  stable  estimates  of 
covariance  so  that  the  Trial  Battery  factor  structures  can  be  compared. 
This  Is  necessary  In  order  to  evaluate  the  extent  to  which  maturation 
affects  the  structure  of  scores  on  the  Trial  Battery  so  that  In  turn,  the 
limits  to  be  placet,  on  the  concurrent  validity  results  can  be  estimated.  A 
sample  size  of  1,000  new  recruits  will  provide  sufficient  stability  for 
these  analyses  (standard  error  of  correlation  coefficient  of  .03). 


Finally,  a  few  comments  must  be  made  about  the  limitations,  or  anticipated 
problems  due  to  use  of  computer-administered  measures.  As  already  stated, 
the  entire  Trial  Battery  will  be  targeted  to  take  no  more  than  four  hours, 
Including  computer-administered  and  apparatus  measures.  The  actual  mech¬ 
anics  of  getting  soldiers  to  complete  these  measures  are  not  only  time 
bound,  they  are  also  constrained  by  the  number  of  computers  and  sets  of 
apparatus  available  at  each  testing  site.  As  we  menitoned  In  Subtask  7,  we 
estimated  that  the  use  of  12  computers  would  allow  the  processing  of  48 
soldiers  per  day  for  a  two-hour  computer  battery.  If  the  version  of  the 
Trial  Battery  used  at  this  stage  requires  one  hour,  then  96  soldiers  could 
be  processed  per  day— which  would  possibly  be  fast  enough  to  keep  pace  with 
the  administration  of  the  other  parts  of  the  prediction  battery.  But, 
these  data  will  be  collected  at  any  one  time  at  multiple  sites,  probably  as 
many  as  8.  Thus,  96  microprocessors  could  be  required  to  have  all  subjects 
take  these  measures,  assuming  all  that  we  have  just  stated.  At  any  rate, 
Information  about  this  matter  will  be  the  focus  of  analyses  and  a  special 
report  at  an  earlier  part  of  the  project  (see  Subtask  8),  so  that  an 
Informed  decision  about  the  Inclusion  of  computerized  measures  at  this 
stage  can  be  made. 

Procedures.  Task  2  staff  will  be  "on  call"  to  answer  questions  from  the 
data  collection  teams  (made  up  of  Task  5  staff)  about  the  main  data  col¬ 
lection  efforts  throughout  the  period  of  Trial  Battery  administration.  In 
addition,  they  will  collect  the  data  for  the  first  three  of  the  four 
special  projects  outlined  above  and  will  train  Arm.  personnel  at  the  appro¬ 
priate  TRADOC  sites  to  collect  the  data  for  the  fourth  sample  of  1,000 
recruits.  The  nature  of  the  facilities  required  will  be  the  same  as 


2-72 


outlined  for  the  pilot  test  (classrooms  with  desks  or  tables  and  chairs, 
cleared  classrooms  for  apparatus,  and  electrical  outlets  in  rooms  where 
computer  tests  are  given).  The  number  of  such  rooms  at  each  site  will 
depend  on  the  number  of  soldiers  to  be  tested  and  the  period  of  time 
allowed  to  collect  the  data. 

Completed  test  materials  will  be  sent  directly  to  Task  1  staff  for  addition 
to  the  data  base. 

Subtask  12:  Analyte  FY83/84  Cohort  Data:  Trial  Battery/Perforaance 
Measures 

Rationale.  This  subtask  Involves  the  analysis  of  the  data  collected  In  the 
previous  subtask,  In  order  to  guide  the  preparation  of  the  Experimental 
Battery  (Subtask  ,13)  for  administration  to  the  FY86/87  cohort  (Subtask 
14).  There  are  two  major  analytic  efforts:  a  concurrent  validity  Investi¬ 
gation  of  the  Trial  Battery  and  a  predictive  validity  Investigation  of  the 
Preliminary  Battery. 

Throughout  these  subtask  procedures.  Task  1  and  2  staff  will  work  closely. 
Task  2  staff  will  bear  primary  responsibility  for  the  analyses  mentioned  In 
points  1  and  4  below,  while  Task  1  staff  will  bear  primary  responsibility 
for  points  2,  3,  and  5  below  and  will  provide  appropriate  data  files  for 
all  analyses. 


2-73 


Procedures.  We  first  discuss  the  concurrent  validity  analyses.  These  fall 

Into  five  general  categories. 

1.  Rel labll Ity/General Izabl 1 Ity .  We  are  concerned  here  with  the  internal 
structure  of  each  measure  and  the  extent  to  which  the  observed  score 
on  a  measure  Is  affected  by  extraneous  factors,  or  the  extent  of 
general Izabl 1 Ity  of  observed  scores. 

The  Internal  structure  of  each  measure  can  be  examined  by  the  usual 
Internal  consistency  estimates  (coefficient  alpha  or  KR20).  Where 
these  estimates  are  lower  than  desired  (say,  less  than  .85),  there  may 
be  multidimensional  Ity  In  the  item  set.  In  such  cases,  factor 
analyses  of  the  Items  may  be  used  to  Identify  unldimenslonal  subsets 
of  the  Items.  Assistance  from  the  Task  1  team  will  be  used  at  this 
point  so  that  the  most  appropriate  factor  analysis  methods  are  used. 
We  should  point  out  here  that  unldimenslonal Ity  Is  desirable,  but  will 
not  be  pursued  to  the  detriment  of  achieving  validity  in  predicting 
job  performance.  As  members  of  the  Scientific  Advisory  Group  have 
pointed  out,  multidimensionality  Is  not  necessarily  a  problem  for  a 
predictor. 

Latent  trait  methods  may  then  be  used  to  calibrate  items  within  the 
unldimenslonal  Item  sets,  although  more  traditional  methods  of  item 
analysis  will  also  be  used,  especially  for  predictors  where  unl- 
dlmenslonal ity  Is  not  as  Important. 


The  effects  of  extraneous  factors  on  observed  test  scores  will  be 
investigated  by  using  the  general Izabll Ity  theory  approach.  Data  will 
have  been  obtained  on  order  of  administration,  practice,  time  (i.e., 
test-retest),  and  fakeablllty  for  those  variables  for  which  It  Is 
appropriate.  For  example,  fakeablllty  data  will  be  collected  for  the 
blo/vocatlonal  interest  inventory,  but  not  for  paper-and-pencll 
cognitive  tests,  and  practice  data  will  be  collected  for  psychomotor 
effects,  but  not  for  the  blo/vocatlonal  interest  Inventory,  etc.  If 
possible,  balanced  factorial  designs  will  have  been  used  to  collect 
these  data  wherever  interactions  between  effects  are  hypothesized. 
For  most  variables,  however,  we  think  no  Interactions  can  reasonably 
be  assumed.  This  will  allow  us  to  compare  the  scores  for  persons 
under  a  given  experimental  condition  to  a  large  "normal"  group;  that 
is,  to  carry  out  Independent,  single  effects  analyses  of  oractlce, 
time,  etc. 

Where  extraneous  factors  are  found  to  affect  observed  scores,  test 
procedures  or  content  will  need  to  be  evaluated  with  regard  to  changes 
that  may  obviate  such  effects.  For  example,  where  practice  effects 
occur,  testing  procedures  will  need  to  be  arranged  to  provide  a  longer 
“warm-up"  or  unscored  testing  time.  If  faking  appears  to  affect 
scores,  items  must  be  appropriately  weighted  to  detect  the  type  and 
direction  of  faking,  and  instructions  for  tests  or  Inventories  altered 
to  prevent  faking. 


2.  Validity/Fairness.  As  we  said,  the  FY83/84  cohort  data  will  provide 
the  opportunity  for  concurrent  validity  analyses.  To  review, 

criterion  data  will  be  available  from  measures  developed  in  Tasks  4 
and  5.  We  will  have  the  ASVAB  scores  and  other  pre-induction  measures 
for  the  FY83/84  cohort  In  the  longitudinal  data  base.  Finally,  we 
will  have  Trial  Battery  data  from  1,000  new  recruits  collected 
coincident  with  the  collection  of  data  on  the  FY83/84  cohort. 

All  these  data  will  be  used  to  estimate  the  validity  and  fairness  of 
the  new  predictor  measures.  The  overriding  objective  of  these 
analyses  Is  to  Identify  the  most  efficient  set  of  new  predictors  that 
Increases  the  accuracy  of  prediction  of  soldiers'  job  performance  In  a 
manner  that  Is  fair  for  race/sex  subgroups.  Classification  of 
soldiers  is  also  an  important  objective,  but  is  dependent  upon  the 
Identification  of  predictors  that  add  to  accuracy  of  prediction.  (It 
Is  also  the  case,  however,  that  classification  considerations  will 
have  an  Important  effect  on  decisions  about  which  new  predictors 
should  be  retained  for  further  investigation,  since  a  predictor  may 
add  little  to  a  general  prediction  equation,  yet  still  be  very  useful 
In  differentiating  success  In  different  occupations.) 

There  are  at  least  two  basic  approaches  to  reducing  the  size  of  a  pre¬ 
dictor  battery.  The  first  Is  an  Internal  structure  approach,  and  the 
second  an  external  validity  approach.  In  the  first  approach,  we 


2-76 


I 


choose  the  subset  of  predictors  that:  (a)  have  high  Internal  consis¬ 
tency  reliabilities  for  each  measure,  and  (b)  have  very  low  correla¬ 
tions  between  measures.  In  the  second  approach,  we  choose  those  pre¬ 
dictors  that:  (a)  have  high  correlations  with  external  criteria  of 
interest,  and  (b)  minimal  correlations  with  each  other.  Both  of  these 
approaches  will  be  applied  at  the  Item  level  or  test  (scale)  level. 
Furthermore,  tests  or  scales  will  be  rescaled  so  that  nonlinear  test 
scoring  methods  can  be  evaluated.  Finally,  items  or  tests  that  appear 
useful  with  regard  to  the  above  Internal  structure  and/or  external 
validity  criteria  must  be  evaluated  with  regard  to  fairness  for 
various  subgroups.  The  Ideal  predictor  Item,  then,  will  have  the 
following  characteristics:  (a)  high  correlation  with  the  predictor 
scale  it  purports  to  measure;  (b)  low  correlation  with  other  predictor 
scale  scores  (and  Items  In  those  scales);  (c)  high  correlation  with 
Army  success  and/or  job  performance  criteria;  and  (d)  similar  response 
characteristics  and  relationships  with  external  criteria  across  race 
and  sex  subgroups;  that  Is,  "fair."  The  iaeal  predictor  scale  will 
have  similar  characteristics:  (a)  high  internal  consistency,  (b)  low 
correlations  with  other  predictor  scales,  (c)  high  correlations  with 
Army  success  and/or  job  performance  criteria,  and  (d)  fairness. 

There  are  many  statistical  methods  available  for  use  In  achieving  the 
above  analysis  objectives,  and  we  will  work  closely  with  Task  1  staff 
to  identify  and  use  those  methods  most  appropriate  for  the  particular 
analysis  problem.  (Some  of  these  techniques  are  explanatory  and  con¬ 
firmatory  factor  analyses,  multiple  regression  and  other  prediction 


optimization  algorithms;  Cleary,  Thorndike,  etc.,  models  of  test 
fairness.)  We  should  also  note  that  we  will  not  duplicate  analysis 
efforts  performed  by  Task  1  staff;  Indeed,  we  envision  very  close 
cooperation  In  the  planning  of  analyses. 

As  noted  above,  a  major  limitation  of  the  Inferences  to  be  made  from 
analysis  of  the  FY83/84  cohort  arises  from  the  fact  that  It  Is  a 
concurrent  validity  design.  About  30  percent  of  the  cohort  may  have 
attrlted,  and  those  remaining  will  have  had  many  months  of  Army  exper¬ 
ience,  including  training  designed  to  improve  the  soldiers'  skills  In 
areas  appropriate  to  overall  soldier  performance  and  specific  job 
performance.  As  noted  earlier  (p.2-76),  we  will  have  trial  battery 
predictor  data  available  for  1,000  new  recruits.  These  data  will 
enable  us  to  estimate  the  effects  of  restriction  of  range  and  changes 
In  the  factor  structure  of  the  Trial  Predictor  Battery  due  to 
attrition  and  experience. 

Utility.  Information  will  be  provided  about  the  utilities  of  various 
levels  of  performance  In  various  MOS  so  that  we  may  more  adequately 
evaluate  the  Incremental  validity  and  utility  of  the  predictors. 
(Scaled  utility  values  will  be  obtained  by  Task  4  staff.)  Therefore, 
a  stralghtworward  analysis  of  incremental  validity,  such  as  increments 
In  R2  when  new  predictors  are  added  to  currently  available  predictors, 
must  be  Informed  by,  and  coordinated  with,  the  utility  analyses  of 
Task  1.  There  are  basically  two  questions  to  be  answered:  (a)  Does  a 


new  predlctor(s)  Increase  predictive  accuracy  over  that  available  with 
current  predictors;  and  (b)  If  so,  by  how  much  does  the  utility 
Increase  exceed  any  additional  costs  of  recruitment,  assessment,  and 
induction? 

Investigation- of Predictor  -Scales.  -A  related  set  of  analyses  concerns 
the  Investigation  into  the  psychometric  and  psychological  meaning  of 
various  intervals  on  the  new  predictor  scales.  Each  new  predictor 
scale  will  be  investigated  to  find  out  (a)  how  persons  falling  In  each 
quintile,  for  example,  score  on  other  predictors  and  on  various  cri¬ 
terion  measures;  and  (b)  celling  effects,  floor  effects,  and  changes 
In  the  error  of  measurement  across  quintiles.  Of  course,  examinees 
can  be  grouped  Into  either  finer  or  coarser  gradations  as  Indicated  by 
the  data. 

Predictive  Validity  Analyses.  These  analyses  will  be  used  as  a  point 
of  comparison  to  the  concurrent  validity  analyses.  It  Is  extremely 
likely  that  many  constructs,  If  not  actual  measures,  will  be  In  common 
between  the  Trial  Battery  used  in  the  concurrent  validity  Investiga¬ 
tion  and  the  Preliminary  Battery.  In  fact,  as  we  earlier  remarked  In 
Subtask  4,  we  will  attempt  to  have  some  of  the  same  scales  In  both 
batteries.  Thus,  we  will  have  available  concurrent  and  predictive 
validity  coefficients  for  some  subset  of  the  constructs  measured  In 
the  Trial  8attery.  In  addition,  a  subset  of  the  sample  will  have 
completed  the  Preliminary  Battery  when  they  were  in  AIT  (Subtask  3) 


and  the  Trial  Battery  (In  Subtask  11)  and  will  have  job  performance 
criterion  scores  on  record.  Although  maturation  effects  make 
Interpretations  nonstralghtforward,  we  will  thus  have  empirical  cor¬ 
relations  between  scores  on  the  Preliminary  and  Trial  Battery  meas¬ 
ures.  The  actual  predictive  validity  analyses,  of  course,  will  con- 
slst  primarily-  of-  correlations  between  the  Preliminary  Battery 
measures  (administered  In  Subtask  3)  with  the  criterion  data  collected 
In  Subtask  11. 

These  two  sets  cf  analyses,  e.g.,  the  predictive  and  concurrent  valid¬ 
ity  analyses,  will  be  integrated  and  a  report  of  findings  will  be  sub¬ 
mitted  to  ARI ,  with  suggestions  for  revisions  of  the  Trial  Battery. 
It  Is  our  belief  that  these  suggestions  will  consist  primarily  of 
reductions  In  the  battery,  at  either  or  both  the  item  or  scale  level. 
This  report  will  be  discussed  with  ARI  In  order  to  make  final  revision 
plans.  Draft  and  final  field  test  plans  for  the  FY86/87  cohort  data 
collection  plan  will  be  prepared  after  the  revisions  have  been 
approved  by  ARI. 

Subtask  13:  Prepare  Experimental  Battery  for  Administration  to  FV86/87 
Cohort 

Rati  ora  1e.  The  purpose  of  this  subtask  Is  to  make  the  revisions  to  the 
Trial  Battery  that  were  decided  upon  In  the  previous  subtask,  l.e.,  prepare 


the  Experimental  Battery,  prepare  associated  administration /materials,  and 
tr.iln  the  personnel  that  will  be  administering  the  battery. 

V,  ; 

This  final  version  of  the  Experimental  Battery  will  oe  administered  to  an 
average  of  2,200  recruits  as  they  enter  AIT  for  each  of  the  MOS  selected  as 

the  focus1  of -the  project.  -As -noted  In  the  Introduction  tb  this  plan, -the 

! 

rationale  for  this  sample  size  Is  to  provide  a  sufficiently  large  longitud¬ 
inal  sample  for  predictive  validity  analyses  of  the  Experimental  Battery, 
given  anticipated  rates  of  sample  attrition.  The  experience  gained  In  all 
prior  battery  administrations  will  be  used  to  prepare  for  this  administra¬ 
tion,  especially  the  administration  of  the  Preliminary  Battery  to  the 
FY83/84  longitudinal  sample,  since  It  will  also  have  taken  place  at  AIT 
sites  which  will  be  Involved  In  the  FY86/87  cohort  administration. 

Procedures.  After  final  revisions  are  made,  sufficient  quantities  of  the 
printed  portions  of  the  battery  will  be  procured.  Then,  apparatus  and 
microprocessors  must  be  obtained  as  previously  noted.  Detailed  administra¬ 
tion  manuals  will  be  written  and  a  one-to-two  day  training  program 
developed.  Site  visits  of  approximately  one  week  duration  will  then  be 
made  to  the  TRADOC  posts  where  data  collection  will  occur.  Persons 
selected  for  data  administrators  will  be  trained,  facilities  inspected,  and 
the  apparatus  and  computers  put  in  place. 


The  exact  details  of  the  steps  outlined  In  the  above  paragraph  depend 
entirely  on  the  final  contents  of  the  Experimental  Battery.  At  this  point,  ' 
we  think  It  will  consist  of  a  twp-to-three  hour  battery  that  will  be  a 
mixture  of  paper-and-pencll  , and  computer-administered  teet-*.'  As  much  as  v- 

'  'i  \\ 

possible,  computer-administered  testing  will  be  used  In  order  to  reduce 

•  *  •  ,  '  w? 

testing  time.  This  implies  that  the  administrator  must  be  experienced 
enough  of  receive  sufficient  training  to  handle  computer  administered 
tests.  Local,  on-site  administrators  will  be  required  for  the  same  reasons  ^ 

as  were  cited  for  the  Preliminary  Battery  administration  (i.e..  Infeasi¬ 
bility  of  Task  2  staff  being  on  site  over  the  entire  one-year  period  that 
data  are  collected- -see  Subtask  3).  By  this  point  In  the  project  the 

s  •  1 

administration  procedures  should  be  very  well  honed  and  we  should  be  able 

> 

to  train  test  administrators  for  virtually  all  problems  or  contingencies. 

Also,  Just  as  for  the  Preliminary  8attery,  one  person  will  be  designated 

as  the  primary  contact  and  will  be  responsible  for  security  of  testing 

■  1  ’  t  \ 

materials,  shipping  completed  batteries,  monitoring  attendance  at  testing 
sessions,  and  communicating  with  Task  ?  staff*.  i 

Finally,  we  note  that  ve  are  currently  conducting  site  visits  at  TRAOOC 
posts  that  wou^d  likely  be  Involved  In  tfe  test  administration  and  are 
informing  them  of  the  nature  of  the  demands  on  soldier  time  and  the  need 
fo*  administration  personnel.  To  date,  the  Information  obtained  from  these 
visits  ir.dlchces  that  the  plans  can  be  carried  out. 


i 

t 


2-82 


rm  e&o  nw  rcs 


Subtask  14:  Monitor  Admlnl  strati  on  of  Experimental  Battery  to  FY86/87 
Cohort*  Further  Analyses  of  FY83/84  Cohort  Data 

Rationale.  Although  Task  2  staff  will  have  trained  local,  on-site 
personnel  and  they  will  be  administering  the  Experimental  Battery,  close 
monitoring  of  the  administration  process  will  be  required.  Given  the 
somewhat  limited  time  that  will  have  been  available  for  analyses  of  the 
FY83/84  cohort  In  Subtask  12,  further  analyses  will  also  be  completed 
during  this  subtask. 


Procedures.  Task  2  staff  will  make  several  scheduled  and  unscheduled 
visits  to  each  test  site  to  observe  test  administration,  test  security  pro¬ 
cedures,  and  to  address  any  problems  that  occur  during  the  administration 
process,  l.e.,  one  entire  year,  from  March,  1986  through  February,  1987. 
We  will  also  set  up  a  regular,  by-phone  reporting  procedure  after  every 
weekly  administration.  (The  battery  will  be  administered.  If  possible  dur¬ 
ing  the  first  week  of  the  soldiers'  AIT  In  order  to  reduce  the  effects  of 
training  on  Experimental  Battery  scores.) 

We  will  monitor  the  numbers  of  soldiers  tested  so  that  progress  toward  tar¬ 
get  sample  sizes  can  be  tracked.  It  Is  probable  that  testing  will  not  be 
required  for  every  class  In  all  19  MOS  and  this  tracking  process  will  be 
the  means  by  which  the  actual  administration  sessions  are  controlled. 
Finally,  with  regard  to  monitoring.  Task  2  staff  will  be  continuously 
available  to  answer  questions  {via  phone)  or  to  make  short  notice  visits  In 
response  to  problems. 


The  exact  nature  of  the  further  ana>: ses  of  FY83/84  cohort  data  are'  depen¬ 
dent  upon  what  has  been  accomplished  earlier.  We  note  here  that  any  loose 
ends  will  be  tied  up  and  Interesting  *urther  analyses  will  be  pursued  at 
this  time.  i 

i 

■  .  ,  ,  ■  . , 

Subtask.  15.  Ahcvyxe  FY 86/87  Cohort  Data  and  Prepare ■  final-  Rmr** 

Rationale.  Task  2  concludes  with  this  subtask.  The  purposes  of  the  sub- 

,1.-1  ... 

task  are  to  compare  the  FY83/84  cohort  and  FY86/87  cohort  In  terms  of  theVr 
success  on  Experimental  Battery  measures,  analyze  the  covariance  of  the 
final  predictor  battery  measures  within  Itself  artd  with  then  current  pre¬ 
induction  measures,  and  analyze  the  relationship  of  the  final  Experimental 

/  •  •  • 

Battery  to  training  performance  measures.  The  ultimate  goal  of  these 
analyses  Is  to  Identify  and  recommend  the  best  battery  for  operational  use; 
based  on  all  data  at  hand.  (These  recommendations  will  be  subject  to  later 
revision,  however,  since  a  follow-up  predictive  validity  Investigation 
of  the  final  Experimental  Battery  will  be  completed  by  when  second  'uur 
performance  measures  are  available  for  the  FY86/87  cohort.! 


Procedures.  The  following  sets  of  analyses  will  be  carried  out: 

/  '  1  ■ 

1.  Comparison  of  FY83/84  and  FY86/87  cohort  dvte.  The  primary  foci  of 
this  analysis  wll't  be  range  restriction,  factor  structure,  and 
psychometric/psychological  meaning  of  the  Experimental  Battery. 
Recall  that  the  final  battery  administered  to  the  FY8c/?7  cohort  will 
be  a  subset  of  that  administered  to  the  FY83/84  cohort.  Therefore, 
the  batteries  will  be  ulfferent  In  the  cor.textural  sense;  that  Is,  the 


2-84 


-~L 


~  i 


A* 


a 


r.\ 


S3 

•» 


FY83/84  cohort  Mill  have  completed  more  predictors,  slightly  longer 
predictors,  and  perhaps,  In  a  different  order.  This  somewhat  limits 
the  Interpretations  that  can  be  placed  on  comparisons  of  the  two  sets 
of  data,  but  not  unduly  so.  It  will  still  be  the  case  that  the 
FY86/87  cohort  battery  will  be  a  subset  of  the  FY83/84  battery,  so 
equivalent  -sets  of  tests  -and/or  Items  for-  both  samples  can  be 
assembled. 

Item  and  scale  distributions  will  be  computed  and  compared  to  Identify 
range  restriction  effects.  (Task  1  researchers  could  then  use  these 
data  to  refine  the  earlier  FY83/84  cohort  analyses;  l.e.,  assemble 
scores  on  the  reduced  predictor  battery  and  correct  the  relationships 
of  these  scores  with  job  performance  criteria  for  restriction  of 
range.) 

Confirmatory  factor  analysis  techniques  will  be  used  to  see  If  the 
factor  structure  of  the  new  predictors  on  the  FY83/84  cohort  applies 
to  the  FY86/87  cohort.  If  not,  then  factor  analyses  will  be  done  to 
Identify  the  new  factor  structure  differences.  (A  major  concern  will 
be  the  attribution  of  factor  structure  differences.  Are  they  due  to 
true  cohort  difference,  FY83/84  vs.  FY86/87  recruits,  or  due  to 
attrition  and  experience— which  Is  present  in  the  FY83/84  cohort,  but 
not  the  FY86/87  cohort?  Recall  that  data,  collected  concurrently  with 
the  FY83/84  cohort  data,  will  be  available  from  a  sample  of  1,000  new 
recruits.  These  data  will  be  of  obvious  usefulness  for  probing  this 
question. ) 


Finally  the  FY86/87  cohort  data  will  be  analyzed  to  discover  the 
psychological/psychometric  meaningful  ness  of  the  predictors  and 
various  scale  Intervals  on  the  predictors.  These  analyses  will 
benefit  from  prior  similar  analyses  of  the  FY83/84  cohort  data,  which 
will  provide  direction  for  these  analyses.  The  investigation  of 
factor  structure  will  also  inform  these  analyses  and  aid  In  focusing 
this  effort. 

2.  Relationship  to  training  criteria.  Training  criteria  data  will  be 
available  for  the  FY86/87  cohort  {from  Task  3).  The  relationships  of 
the  experimental  battery  to  these  criteria  will  be  thoroughly 
investigated.  These  analyses  will  focus  on  the  absolute  and 
incremental  validity  of  the  Experimental  Battery  for  training 
completion  and  success,  although  another  Interesting  problem  Is  the 
prediction  of  success  at  various  stages  of  training.  If  appropriate 
training  criteria  are  available,  these  kinds  of  analyses  will  be 
completed. 

3.  Covariances  of  Experimental  Battery  measures.  Correlation  matrices 
and  factor  analyses  will  be  completed  to  Identify  redundancies  within 
the  battery  Itself. 

4.  Covariances  of  Experimental  Battery  measures  and  current  pre-induction 
measures.  Correlation  matrices  and  factor  analyses  of  the  two  sets  of 
measures  will  be  completed  to  identify  redundancies  across  the  two 
sets  of  measures. 


2-86 


5.  Classification  analyses.  The  FY86/87  cohort  sample  will  be  composed 
of  soldiers  from  MOS  of  different  occupational  types,  intended  to 
represent  the  diversity  of  all  Army  jobs.  This  provides  the 
opportunity,  at  a  minimum,  to  examine  score  differences  between  MOS 
groups  on  the  Experimental  Battery  measures  In  order  to  examine  their 
value  for  classification  purposes.  More  sophisticated  analyses,  such 
as  multiple  discriminate  functions  analyses  or  centour  analyses  are 
certainly  possible,  given  the  sample  sizes.  However,  we  must  keep  In 
mind  that  these  classifications  or  group  memberships  have  not  been 
made  optimally,  so  caution  must  be  exercised. 

An  interesting  possibility,  however,  would  be  to  Identify  "outliers"  within 
these  groups  and  examine  their  score  profiles.  These  outliers  will  be 
particularly  interesting  to  follow  up  In  terms  of  tenure  and  job 
performance  as  those  data  become  available.  Draft,  and  after-ARI  review, 
final,  technical  reports  will  be  prepared  on  the  final  set  of  recommended 
Instruments  and  on  all  technical  work  performed  on  Task  2. 


SUMMARY  OF  EXPECTED  OUTCOMES  FROM  TASK  2 


Operational  Outcomes 

1.  Non-cognltlve  Attributes  Inventory.  This  will  be  a  relatively  short, 
untlmed,  paper-and-pencll  inventory  suitable  for  administration  at 
Military  Entrance  Processing  Stations.  The  Inventory  will  contain  the 
most  efficient  set  of  measures  of  biographical  data,  and  vocational 
Interests  that  proves  useful  for  the  selection  and  classification  of 
applicants.  Scores  on  Inventory  scale  will  be  Input  to  the  selection 
and  classification  algorithms.  Another  possible  use  of  this  Inventory 
Is  Its  administration  at  recruiting  stations.  Recruiters  could  use 
scores  generated  from  the  Inventory  to  counsel  recruits  in  their 
choice  of  MOS.  This  latter  use  Is  especially  feasible  If  the 
capability  for  computerized  administration  and  scoring  Is  in  place, 
which  would  go  a  long  way  toward  overcoming  "test  security"  problems. 
Although  the  bulk  of  the  research  with  the  Inventory  will  most  likely 
be  conducted  In  a  paper-and-pencll  format,  this  Instrument  would  be 
very  amenable  to  conversion  to  a  computer-administered  format,  and 
research  will  have  been  conducted  to  determine  the  comparability  of 
results  across  these  two  formats. 


2-38 


m  iha t&t ' 4t Etfei  0m irrtft 'i*ii  rm  1  "**“  —ir 


Perceptual /Psychomotor  Battery.  This  will  be  a  battery  of  measures  In 
the  perceptual /psychomotor  area  that  will  be  primarily,  If  not 
completely,  computer  administered.  The  measures  will  tap  constructs 
that  have  shown  to  tap  unique  variance  over  and  above  that  measured  on 
the  ASVAB.  (At  this  point  In  time,  the  major  unresolved  Issue  Is 
whether  large-scale  data  can  be  collected  on.  these  computer- 
administered  measures,  given  practical  constraints  of  time  and  money. 
We  have  assumed  that  positive  decisions  were  reached  at  the  various 
decision  points  outlined  In  this  research  plan.) 

Another  operational  outcome  will  be  Information  about  the  vulner¬ 
ability  of  the  set  of  non-cognlti ve  measures  to  differing  motivational 
sets  (comparisons  of  responses  of  soldiers  on  active  duty  to 
applicants  at  MEPS)  and  to  faking.  This  information  should  enable  the 
Army  to  make  Informed  decisions  about  the  reliance  that  can  be  placed 
on  these  measures  In  an  operational  setting. 

Additional  Cognitive  Measures  or  Improved  Cognitive  Measures.  These 
will  be  paper-and-pencll  measures  of  cognitive  abilities  not  presently 
measured  or  Improved  versions  of  those  currently  In  use. 

In  a  sense,  all  the  outcomes  listed  above  can  be  thought  of  as 
optimistic.  These  outcomes  assume  that  the  research  will  show  that 


new  pre-induction  measures  will  show  Incremental  validity  (over  the 
current  measures)  and/or  Increase  the  efficiency  of  classification  of 
recruits  into  MOS.  The  research  may  show,  however,  that  some  or  all 
of  the  new  measures  do  not  result  In  such  Increments.  We  maintain 
that  this  Information  Is  a  highly  valuable  operational  outcome,  since 
It  would  confirm  that  current  pre-induction  measures  would  be  effec¬ 
tively  performing  the  job  of  selection  and  classification. 

Scientific  Outcomes 

1.  Delineation  of  empirical  relationships  between  measures  of  human 
attributes  across  major  domains.  Although  relationships  between  meas¬ 
ures  within  major  domains  (e.g.,  within  traditional  cognitive  tests) 
have  been  fairly  well  mapped  out,  there  Is  much  less  Information 
available  about  relationships  between  measures  from  different 
domains.  Task  2  research  will  provide  such  empirical  Information. 

2.  Tests  of  Validity  In  Applied  Settings.  Several  of  the  measures  that 
will  likely  be  used  In  this  project  have  relatively  little  available 
evidence  of  their  validity  In  the  applied  setting;  l.e.,  for  predict¬ 
ing  succes  In  training  and  on  the  job.  Interest  measures  have  been 
shown  to  predict  occupational  entry  and  longevity  but  have  been  less 
well  researched  with  respect  to  degree  of  successful  job  performance. 


The  newer  cognitive/perceptual  measures  have  generally  not  been  evalu¬ 
ated  In  applied  settings  (Hunt,  1983),  and  to  a  lesser  extent,  this  Is 
true  of  and  psychomotor  measures.  Task  2  research  will  provide  a 
rigorous  Investigation  of  the  "applied  validity"  of  such  measures. 

Incremental  Validity.  The  points  made,  just  above  apply  equally. well 
to  the  question  of  Incremental  validity.  Task  2  should  be  able  to 
provide  a  definitive  answer  to  the  question  of  how  much  Increase  In 
the  accuracy  of  prediction  for  the  job  performance  In  disparate  jobs 
can  be  obtained  by  adding  some  non-cognltlve  measures,  perceptual/ 
psychomotor  measures,  or  additional  cognitive  measures  to  the  ASVAB— 
which  Is  an  excellent  representative  of  traditional  cognitive  tests 
used  to  predict  training  and  job  performance. 

Linear  Composite  vs.  Subgroup  Approaches  to  Selectlon/Classlflcatlon. 
Owens  and  Schoenfeldt  (1979)  have  championed  a  subgrouping  approach  to 
the  problem  of  prediction  In  contrast  to  the  more  commonly  employed 
approach  of  linear  composites.  Very  briefly,  the  subgrouping  approach 
advocates  the  classification  of  persons  Into  one  of  a  finite  set  of 
groups,  based  on  scores  on  a  set  of  measures,  and  then  making  similar 
predictions  for  those  Individuals  In  the  same  group.  The  linear  com¬ 
posite  approach  advocates  measuring  persons  on  several  measures,  and 
then  applying  a  set  of  linear  weights  to  a  person's  scores  on  those 
measures  to  make  predictions.  Task  2  will  provide  sufficient  measures 
to  operationalize  both  methods  and  compare  their  effectlvenes  In 
selection  and  classification. 


Validity  Generalization.  Recent  research  has  shown  that  the 
validities  of  cognitive  tests  generalize  quite  well  across  different 
kinds  of  settings  and  occupations  (Schmidt,  Hunter  &  Pearlman,  1981; 
Schmidt  &  Hunter,  1978;  Schmidt,  Hunter,  Pearlman  &  Shane,  1979). 
Little  or  no  research  exists,  however,  on  the  degree  to  which  other 
types  of  predictors  generalize.  Task  2  provides  the  opportunity  to 
extend  the  Investigation  of  validity  generalization  to  these  other 
types  of  predictors.  Measures  of  biographical  data,  vocational 
Interests,  perceptual,  and  psychomotor  abilities  will  be  administered 
to  soldiers  In  a  variety  of  MOS,  and  school,  attrition  and  job 
performance  criteria  data  will  be  available.  Validity  generalization 
analyses  will  be  conducted  for  all  predictor  measures,  as  well  as 
predictor  composites.  These  findings  should  significantly  contribute 
to  the  growing  body  of  knowledge  about  validity  generalization. 


REFERENCES 


Bowmas,  O.A.,  4  Heckman,  R.VI.  Job  analysis  of  the  entry-level  firefighter 
position.  Minneapolis:  Personnel  Decisions,  Inc.,  1976. 

Brogden,  H.E.  Increased  efficiency  of  selection  resulting  from  replacement 
of  a  single  predictor  with  several  differential  predictors. 
Educational  and  Psychological  Measurement,  1951,  11_,  173-196. 

Callender,  J.C.,  4  Osburn,  H.G.  Testing  the  constancy  of  validity  with 
computer-generated  sampling -distributions  of  the  mil  tipi Icatlve  model 
variance  estimate:  Results  for  petroleum  Industry  validation 

research.  Journal  of  Applied  Psychology,  1981,  66,  274-281. 

Cronbach,  L.J.,  Gleser,  G.C.,  Nanda,  H.,  4  Rajaratnam,  N.  The 

dependability  of  behavioral  measurements:  Theory  of  general  Izabllfty 
for  scores  and  profiles.  New  York:  Wiley,  l9zz.  3 

Dunnette,  M.D.  Basic  attributes  of  Individuals  In  relation  to  behavior  In 
organizations.  In  M.D.  Ounnette  (Ed.),  Handbook  of  Industrial  and 
organizational  psychology.  Chicago:  Rand  McNally,  1976. 

Dunnette,  M.D.,  4  Borman,  VI. C.  Personnel  selection  and  classification 
systems.  In  M.R.  Rosenzwelg  4  L.W.  Porter  (Eds.),  Annual  Revl ew  of 
Psychology.  Palo  Alto,  CA:  Annual  Reviews,  Inc.,  1979,  30,  477-525. 

Dunnette,  M.D.,  Rosse,  R.L.,  Houston,  J.S.,  Hough,  L.M.,  Toquam,  J., 
Lammleln,  S.,  King,  K.M.,  Bosshardt,  M.J.,  4  Keyes,  M.A.  Development 
and  validation  of  an  Industry-wide  electric  power  plant  operator 
selection  system!  Minneapolis:  Personnel  Decisions  Research 

Institute,  1981. 

Gulon,  R.M.  On  trinitarian  doctrines  of  validity.  Professional 

Psychology,  1980,  U_,  385-390. 

Hunt,  E.  On  the  nature  of  Intelligence.  Science,  1983,  219,141-147. 

Hunter,  J.E.,  Schmidt,  F.L.,  4  Jackson,  G.B.  Integrating  research  findings 
across  studies.  Unpublished  paper  in  Methodological  Innovation  In 
studying  organizations.  Symposium  presented  at  tfie  Center  for 
Creative  Leadership,  Greensboro,  NC:  1981. 

Linn,  R.L.,  Harnlsh,  D.L.,  4  Dunbar,  S.B.  Corrections  for  range 

restriction:  An  empirical  Investigation  of  conditions  resulting  in 
conservative  corrections.  Journal  of  Applied  Psychology,  1981,  5^,  In 
press.  ~ 

Maler,  M.H.  Validation  of  selection  and  classification  tests  In  the  Army 
(Working  Paper:  Personnel  Utilization  Area  82-2).  Alexandria,  VA: 
U.S.  Army  Research  Institute  for  the  Behavioral  and  Social  Sciences, 
1981. 

Messlck,  S.  Test  validity  and  the  ethics  of  assessment.  American 
Psychologl st,  1980,  35^,  1012-1027. 


2-93 


Owens,  W.A. ,  Jr.  Background  data  (Chapter  14,  pp.  609-645).  In  M.D. 

Ounnette  (Ed.),  Handbook  of  Industrial  and  organizational  psychology. 
Chicago,  II:  Rand  McNally,  I97ET 

Owens,  W.A. ,  4  Schenfeldt,  L.F.  Toward  a  classification  of  persons. 
Journal  of  Applied  Psychology,  1979,  64,  569-607. 

Pearli.ian,  K.,  Schmidt,  F.L.,  4  Hunter,  J.E.  Validity  generalization 
results  for  tests  used  to  predict  job  proficiency  and  training  success 
In  clerical  occupations.  Journal  of  Applied  Psychology,  1980,  65, 
373-406.  -  - 

Peterson,  N.6.,  4  Bownas,  O.A.  Task  structure  and  performance  acquisition 
(Chapter  3).  In  M.D.  Ounnette  6  E.A.  Fleishman  (Eds.),  Human 
capability  assessment.  Mew  York:  Lawrence  Erlbaum  &  Associates, 


Peterson,  N.6.,  6  Houston,  J.S.  The  prediction  of  correctional  officer  Job 
erformance:  Construct  validation  in  an  employment  se 


nneapons:  personnel  uecisions  Kesearcn  institu 


LEI 


Peterson,  N.G.,  Houston,  J.S.,  Bosshardt,  M.D.,  &  Ounnette,  M.D.  A  study 
of  the  correctional  officer  job  at  Marlon  Correctional  Institution, 


eveiopment  of  selection  procedures,  training  recommendations 


ntormati on  program.  Minneapolis:  Personnel  Decisions 
esearch  Institute,  1977. 


Peterson,  N.G.,  Houston,  J.S.,  4  Rosse,  R.l.  The  LOMA  Job  effectiveness 
rediction  system,  technical  report  #4:  Va 


e  urrice  management  Association,  in  press. 

Schmidt,  F.L.,  &  Hunter,  J.E.  Moderator  research  and  the  law  of  small 
numbers.  Personnel  Psychology,  1978,  31,  215-231. 


Schmidt,  F.L.,  Hunter,  J.O.,  4  Caplan,  J.R.  Validity  generalization 
results  for  two  job  groups  In  the  petroleum  industry.  Journal  of 
Applied  Psychology,  1981,  66,  261-273. 

Schmidt,  F.L.,  Hunter,  J.E.,  4  Pearlman,  K.  Task  differences  as  moderators 
of  aptitude  test  validity  In  selection:  A  red  herring.  Journal  of 
Applied  Psychology,  1981,  66.,  166-185. 

Schmidt,  F.L.,  Hunter,  J.E.,  Pearlman,  K.,  4  Shane,  G.S.  Further  tests  of 
the  Schmldt-Hunter  Bayesian  validity  generalization  procedure. 
Personnel  Psychology,  1979,  3£,  257-281. 

Walker,  H.W.,  4  Lev,  J.  Statistical  Inference.  New  York:  Henry  Holt  4 
Co.,  1953. 

Wernlmont,  P.F.,  4  Campbell,  J.P.  Signs,  samples  and  criteria.  Journal  of 
Applied  Psychology,  1968,  52,  372-376. 


TASK  3  RESEARCH  PLAN 


MEASUREMENT  OF  SCHOOL/TRAINING  PERFORMANCE 

GENERAL  PURPOSE  OF  TASK  3 

The  general  purpose  of  Task_3_1$  to  generate  .Information  about  the  perfor¬ 
mance  of  soldiers  in  training,  to  be  used  in  the  validation  of  Initial 
predictors  and  In  the  prediction  of  first -tour  and  second-tour  performance 
in  the  Army.  To  accomplish  this  purpose,  existing  measures  of  training 
performance  will  be  analyzed  and  evaluated,  new  measures  will  be  developed 
where  neeeded,  and  composite  sets  of  predictor  and  criterion  measures  will 
be  assembled. 

As  job  performance  surrogates,  training  measures  can  serve  to  reduce  the 
time  required  for  predictor  validations  from  years  to  months.  When  used  to 
predict  subsequent  performance,  training  measures  have  the  potential  to 
Increase  the  accuracy  of  classification  Into  MOS  over  that  obtained  by  the 
use  of  pre-induction  predictors  alone.  Both  the  extent  to  which  training 
measures  can  be  used  as  surrogates  for  more  ultimate  job  performance  cri¬ 
teria  and  the  degree  of  Incremental  validity  obtained  by  Including  training 
success  as  a  predictor  Itself  will  be  assessed  In  Task  1. 

A  further  purpose  of  Task  3  Is  to  collect  and  interpret  training 
performance  data  derived  from  recent  and  current  measures  and  to  enter 
these  data  into  the  Longitudinal  Research  Data  Base  (LRDB)  for  use  by  other 
tasks.  Training  performance  data  from  the  FY81/82  cohort,  for  example, 
will  be  used  by  Task  1  to  make  Initial  assessments  of  the  ability  of 


current  pre-induction  predictors  to  predict  training  performance.  This 
procedure  Is  an  evaluation  of  selection  tests  rather  than  of  training 
effectiveness. 


BACKGROUND  ISSUES  AND  RATIONALE 


A  principal  Issue  that  will  be  addressed  In  Task  3  Is  that  of  the  defini¬ 
tion  of  training  success.  As  explained  below,  this  Issue  Is  particularly 
Important  because  the  characteristics  to  be  sought  In  training  measures 
may  differ  according  to  whether  the  measures  are  used  as  administrative 
criteria,  as  criteria  In  predictor  validation,  or  as  predictors  them¬ 
selves.  This  Issue  also  Involves  a  related  question:  What  mechanism 
explains  the  predictive  relationship  between  training  performance  and  job 
performance? 

It  Is  naturally  desirable  to  use  as  reliable  and  comprehensive  measures  as 
possible  to  obtain  training  performance  Information  In  Task  3.  This  raises 
two  furtner  Issues:  (a)  how  much  reliable  variance  Is  there  In  existing 
(and  newly  developed)  training  measures,  and'  (b)  what  components  of  train¬ 
ing  performance.  If  any,,  are  not  currently  represented  by  existing  meas¬ 
ures?  Finally,  whether  training  performance  Is  to  serve  as  criterion  or 
predictor,  the  additional  question  remains  of  how  to  sample  the  training 
performance  domain. 

Definition  of  Training  Success 

The  way  In  which  trainee  achievement-,  or  success,  Is  conceptualized, 
defined,  and  measured  is  a  function  of  several  factors.  The  major  con¬ 
siderations  are  as  follows:  , 


3-3 


What  are  the  overall  organizational  goals  that  the 
training  program  Is  Intended  to  serve?  For  example,  Is 
It  to  produce  graduates  who  cap  quickly  step  Into  a 
specific  job  and  perform  satisfactorily  as  long  as 
conditions  don't  change  too  drastically,  or  Is  It  to 
prepare  Individuals  for  a  very  dynamic  job  environment 
In  which  equipment  and  specific  job  duties  will  change 
considerably  over  the  Individual  -s  tour  of  duty? 

What  model  or  framework  was  used  to  design  the  training 
program?  (For  example,  were  very  specific  behavioral 
objectives  used  to  specify  the  content?  Was  the  Intent 
to  teach  fact  or  skills?) 

What  sources  were  used  to  generate  the  training  content? 
(For  example,  supervisory  complaints,  systematic  needs 
analysis  of  job  Incumbents,  human  factor  specifications 
for  new  situations  or  equipment,  the  trainer's  theory 
about  what  should  be  taught.) 

What  are  the  objectives  for  which  the  criterion  meas¬ 
ure  will  be  used?  For  example: 

o  to  Identify  which  skill  and  knowledge  areas  have  been 
mastered  and  which  need  remedial  work. 

o  to  evaluate  the  strengths  and  weaknesses  of  the 
training  program  Itself. 

o  to  certify  the  individual  as  ready  for  promotion  to 
the  next  course  or  for  entry  Into  the  job. 


3-4 


Choices  among  these  factors  will  also  have  an  Influence  on  the  degree  to 
which  success  In  training  Is  related  to  (correlated  with)  job  performance. 
To  the  extent  that:  (a)  a  training  program  Is  meant  to  serve  as  certifica¬ 
tion  for  entry  into  the  job;  (b)  the  content  is  derived  from  a  job  or  task 
analysis  of  job  encumbents;  (c)  the  training  objectives  were  designed  to 
cover  all  major  job  task-  factors;  (d)  teaching  more -general  analytic, 
problem  solving,  or  technical  skills  Is  part  of  the  training  objectives; 
and  (e)  the  training  program  does  not  control  for  Individual  differences  In 
ability,  the  correlation  between  training  achievement  and  job  success 
should  be  maximized. 

It  follows  that  If  the  presence  or  absence  of  a  correlation  between 
training  and  performance  Is  to  be  explained  or  Influenced,  the  above 
factors  are  what  must  be  accounted  for.  Of  particular  Importance  Is 
whether  the  content  of  the  training  criterion  Is  limited  to  the  specific 
training  objectives  or  whether  It  Is  sampled  from  job  content. 

This  entire  Issue  would  be  moot  If  training  requirements  and  job  require¬ 
ments  were  Identical,  and  to  make  them  as  similar  as  possible  Is  the  goal 
of  much  of  the  Army's  current  training  development  procedures.  But  this  Is 
a  difficult  goal,  and  differences  between  the  behaviors  conducive  to 
training  success  and  job  success  will  Inevitably  exist.  Some  differences 
are  even  Inherent  In  the  fact  that,  to  achieve  economies  of  scale,  training 
must  be  more  formal  and  structured  than  the  job.  Those  who  learn  best  in 
one  situation  may  not  be  those  who  do  so  In  the  other,  (Certainly  there  Is 
anecdotal  evidence  that  many  effective  job  performers  were  not  distin¬ 
guished  academically.) 


In  order  to  assemble  appropriate  predictor  and  criterion  composites  of 
training  measures,  then.  It  will  be  necessary  to  determine  the  relation  of 
existing  measures  to  training  content  and  to  job  content.  Likewise,  it 
will  be  necessary  to  Investigate  the  mechanism  by  which  training  perfor¬ 
mance  predicts  job  performance,  by  relating  both  training-specific  and 
job-specific  test  Items  to-  MOS-specIflc  criteria.-  To  determine  the 
relation  of  existing  measures  to  training  content  and  to  job  content, 
measures  will  be  evaluated  at  Arn)y  schools  (Subtask  3.2).  In  addition,  the 
procedures  currently  followed  to  develop  Army  training  content  and  training 
measures  will  be  Identified  (Subtask  3.3).  As  required,  additional  job 
knowledge  tests  will  be  developed  (Subtask  3.4),  and  job  knowledge  Items 
will  be  Identified  as  school -learned,  job-learned,  or  both  (Subtask  3.6). 


Reliable  Variance 

Training  Is  designed  to  eliminate  Individual  differences  by  bringing  each 
soldier  to  the  training  standard.  For  Improving  selection  and  classifica¬ 
tion,  however,  measures  are  needed  that  possess  substantial  reliable  vari¬ 
ance.  For  this  reason.  It  will  be  necessary  to  review  existing  measures 
and  examine  training  courses  to  seek  out  components  of  training  performance 
that  exhibit  the  greatest  amount  of  true  variability. 


As  an  example,  It  Is  a  common  practice  to  allow  a  trainee  several  attempts 
to  pass  a  performance  test,  without  penalizing  the  soldier  for  early 
failures.  For  administrative  purposes,  the  primary  concern  is  that  each 
soldier  reach  the  standard,  not  whether  one  soldier  takes  longer  than 


3-6 


another.  For  the  purpose  of  developing  reliable  criterion  and  predictor 
measures,  however.  It  would  be  desirable  to  refine  the  scoring  procedure 
for  such  a  measure  In  order  to  extract  such  Information  as  number  of 
attempts  to  reach  mastery  and  time  to  reach  mastery.  In  particular.  It 
will  be  necessary  to  focus  on  performance  early  In  training,  when  the  true 
variability  across  students  can  be  expected  to  be  greater. 

Coverage  of  the  Training  Performance  Domain 

In  the  past.  Army  training  measurement  focused  on  paper-and-pencll  knowl¬ 
edge  tests  and  thus  primarily  on  the  cognitive  components  of  the  training 
performance  domain.  In  more  recent  years,  the  Army  has  emphasized  hands-on 
testing,  thus  capturing  perceptual /motor  components  of  the  domain  as  well. 
This  shift  toward  “performance  testing,"  however,  was  due  principally  to 
the  administrative  decision  that  the  hands-on  format  was  preferable  over¬ 
all,  not  to  a  finding  that  training  success  Is  due  almost  entirely  to  motor 
skill. 

It  cannot  be  assumed,  then,  that  the  allocation  of  paper-and-pencll  and 
hands-on  tests  In  a  given  MOS  proportionally  reflects  particular  components 
of  the  performance  domain.  Likewise,  there  may  be  components  of  training 
success  that  are  not  represented  by  any  existing  measures.  Accordingly, 
Task  3  will  develop:  new  methods  of  assessing  requirements  that  are 
presently  difficult  to  assess,  new  performance  Indices,  and  measures  of 
general  performance,  as  required  to  represent  the  domain  of  training 
performance  as  completely  as  possible.  These  measures  will  be  developed  In 
Subtask  3.5:  Construction  of  Prototype  Measures. 


Regardless  of  whether  training  measures  are  derived  from  training  or  job 
content,  evaluation  of  training  performance  entails  selecting  a  basis  for 
sampling  the  content.  The  measurement  literature  provides  little  systema¬ 
tic  guidance  for  resolving  questions  about  appropriate  strategies.  The 
Interservice  Procedures  for  Instructional  Systems  Development  (1975)  lists 
these  sampling  criteria,  but  leaves  the  choice  to  the  user:  percent  of 
persons  performing,  percent  of  time  spent  In  performing,  probable  conse¬ 
quences  of  Inadequate  performance,  task  learning  difficulty,  probability  of 
deficient  performance,  length  of  time  spent  In  performing,  and  length  of 
time  between  job  entry  and  task  performance.  The  Army  In  Its  Guide!  ines 
for  Development  of  Skill  Qualification  Tests  (1977)  lists  the  following: 
known  performance  deficiencies,  tasks  contributing  to  the  operation  or 
maintenance  of  critical  combat  systems,  tasks  related  to  deficiencies  In 
crew  or  unit  performance,  tasks  that  have  been  revealed  as  important  In 
prior  evaluations,  and  proportional  samples  from  different  content  or  func¬ 
tional  areas  of  performance.  Again  the  choice  of  a  factor  or  combination 
of  factors  to  be  used  In  sampling  Is  left  to  the  user. 

One  difficulty  with  factors  such  as  many  of  these  Is  that  they  are  defined 
In  terms  of  variables  external  to  job  behavior  Itself.  As  a  consequence, 
the  same  behavior  evaluated  In  two  different  contexts  can  legitimately  be 
placed  in  different  categories.  Although  sampling  on  the  basis  of  such 
extrinsic  factors  cannot  be  avoided  entirely,  It  is  clearly  desirable  to 
attempt  to  represent  whatever  kinds  of  behavior  are  present  in  job  perfor¬ 
mance.  Several  bases  for  Identifying  different  types  of  behavior  are 


3-8 


available.  The  Instructional  Quality  Inventory  (Ellis,  Wulfeck  4 
Frederick,  1979)  classifies  training  objectives,  test  Items,  and  components 
of  Instruction  In  the  following  categories:  fact,  concept,  procedure, 
rule,  and  principle.  Lumsdalne  (1960)  suggested  the  following  categories 
for  the  classification  of  training  content:  learning  Identification, 

perceptual  discriminations,  comprehension  of  principles  and  relationships, 
procedural  sequencing,  decision-making,  and  perceptual -motor  skills. 

When  these  and  similar  classification  schemes  are  examined,  three  major 
categories  emerge  that  we  consider  to  have  possible  Implications  regarding 
test  format:  content  that  does  not  require  generalization,  e.g.,  the  ap¬ 
plication  of  a  procedure;  content  that  requires  generalization,  e.g.,  the 
application  of  rules  and  principles;  and  content  that  requires  perceptual 
and/or  motor  skill.  Components  of  tasks  that  require  perceptual  and/or 
motor  skill  will  become  candidates  for  hands-on  testing.  Other  components 
of  tasks  will  be  classified  as  requiring/not  requiring  generalization  so 
that  both  categories  of  behavior  will  be  represented  in  paper-and-pencl  1 
knowledge  tests.  Special  attention  will  be  given  to  performance  requiring 
generalization  to  determine  whether  the  paper-and-pencl 1  format  Introduces 
artificial  cues  that  diminish  the  applicability  of  that  format  for  that 
category  of  behavior.  If  so,  prototype  measures  (Subtask  3.5)  may  be 
needed  to  represent  content  Involving  general Izlng. 

A  perhaps  more  difficult  question  will  remain:  how  to  select  from  within 
these  major  categories?  We  hope  to  extend  the  procedures  developed  by 
Wheaton,  Flngerman,  and  Boycan  (1978)  In  a  rather  restricted  situation 
(qualification  testing  for  tank  gunnery).  In  one  of  the  few  studies  that 


has  analyzed  sampling  strategies  In  relation  to  testing  purpose,  Wheaton 
et  al.  discussed  the  suitability  of  six  possible  bases  for  sampling  test 
content  from  a  domain  of  training  objectives:  random  sampling,  frequency 
of  task  performance,  task  difficulty  and  performance  variability,  general- 
izablllty  of  objectives,  criticality  of  objectives,  and  task  commonality. 
They  decided  to  maximize  coverage  of  the  job  domain  by  sampling  on  the 
basis  of  task  commonality.  They  selected  tasks  that  had  a  maximal  number 
of  elements  In  common  with  other  tasks  in  the  domain  but  a  minimal  number 
of  elements  in  common  with  other  tasks  on  the  test.  In  contrast  to  factors 
such  as  frequency,  criticality,  and  Importance,  task  commonality  Is  not 
defined  In  terms  of  variables  external  to  a  task,  but  simply  In  terms  of 
number  of  Identical  overt  behaviors.  Although  judgments  may  ultimately  be 
required  In  establishing  commonality,  this  approach  offers  possibilities 
for  superior  reliability  and  objectivity. 

Sampling  on  the  basis  of  commonality  of  task  elements  will  also  be  employed 
In  Task  3.  Since  the  tasks  to  be  encountered  in  this  work  are  expected  to 
be  more  heterogeneous  than  those  dealt  with  in  the  Wheaton  et  al.  study, 
establishing  a  basis  for  estimating  commonality  will  require  considerable 
effort.  Nevertheless  commonality  Is  seen  as  the  most  promising  approach  to 
the  sampling  Issue. 


3-10 


OBJECTIVES 


1.  Collect,  analyze,  and  Interpret  information  about  existing  training 
criterion  measures  to  augment  the  FY81/82  cohort  data,  and  provide  this 
Information  to  other  tasks.  This  Includes  an  evaluation  of  specific 
measures  represented -In  the-Enl 1  sted  Master  File  and  recommendations  as 
to  what  Information  now  collected  at  schools  but  not  entered  In  school 
records  and/or  entered  In  the  FY81/82  cohort  data  base  should  be 
entered  Into  the  LRDB. 

2.  From  the  available  measures  In  each  of  the  19  MOS,  assemble  the  most 
appropriate  set  of  training  performance  criteria  to  be  used  In  valida¬ 
ting  the  selection  and  classification  measures  developed  by  Task  2  and 
In  determining  the  Incremental  validity  obtained  by  using  training 
performance  In  addition  to  pre-induction  information  In  predicting 
MOS-specIflc  and  Army-wide  performance. 

3.  Determine  the  extent  to  which  the  predictive  relationship  between 
training  performance  and  job  performance  Is  attributable  to  content 
learned  in  training  versus  content  learned  on  the  job  versus  general 
cognitive  ability. 

4.  Advise  Army  trainers  on  how  existing  performance  measures  and  scoring 
procedures  can  be  refined  to  Increase  reliability  and  amount  of  Infor¬ 
mation  obtainable  from  training  measurement. 


5.  Construct  an  end-of-course  comprehensive  job  knowledge  test  for  each  of 
the  19  MQS.  These  will  provide  a  set  of  common  measures  across  all  MOS 
In  Project  A. 

6.  Develop  prototype  measures  of  components  of  training  performance  not 
represented  by  existing _ measures  or  newly  developed  job  knowledge- 
tests. 


3-12 


OVERALL  SUMMARY  OF  THE  PROCEDURE 


m 


* 


f 


V 


u 


Criterion  and  predictive  Information  about  the  performance  of  persons  In 
training  In  Army  schools  will  be  provided  through  analysis  of  performance 
on  recent  and  currently  administered  school  measures  and  analysis  of  per¬ 
formance  on  Improved  and  newly  developed  measures.  School  performance  In¬ 
formation  will  be  generated  In  the  following  subtasks: 


Literature  , review  and  planning.  We  have  reviewed  the  literature  on  the 
Issues  and  methods  of  evaluating  student  achievement  and  are  currently 
drafting  abstracts.  We  have  prepared  a  master  plan  for  Task  3  and  will 
subsequently  submit  field  test  plans  as  specified  In  the  master  plan. 

Evaluation  of  existing  measures.  The  performance  of  persons  in  training  In 
the  FY81/82  and  FY83/84  cohorts  will  be  examined  based  on  studies  of  Infor¬ 
mation  available  In  school  records  and  already  entered  In  the  LRD6  by  ARI. 
The  tests  currently  used  In  Army  schools  will  be  examined  In  discussions 
with  SME  to  determine  the  relation  of  their  content  to  training  require¬ 
ments  and  will  be  examined  statistically  to  determine  the  adequacy  of  their 
measurement  characteristics. 

Analysis  of  Army  training  and  evaluation  procedures.  The  primary  purpose 
of  this  subtask  is  to  aid  In  determining  the  content  validity  of  current 
training  exercises  and  training  measures  by  Identifying  the  processes  by 
which  these  components  of  the  training  system  are  derived  and  by 
establishing  tnelr  relation  to  job  content. 


3-13 


Revision/construction  of  new  comprehensive  knowledge  tests.  To  provide 
Improved  measures  to  serve  both  as  criteria  of  school  performance  and  as 
predictors  of  job  performance,  new  comprehensive  knowledge  tests  will  be 
developed  In  each  of  19  MOS.  Knowledge  will  be  sampled  based  on  common¬ 
ality  across  the  MOS  tasks,  on  estimates  of  frequency  of  error  In  perfor¬ 
mance,  and  on  representation  of- two  classes  of  task  components:  those 
requiring  the  application  of  procedures  and  those  requiring  the 
generalization  of  Information, 

Development  of  prototype  measures.  To  represent  components  of  training 
performance  not  represented  by  existing  measures  or  newly  developed  job 
knowledge  tests,  new  Indices  will  be  derived  from  existing  measures,  and 
other  prototype  measures  will  be  developed. 

Identification  of  training-relevant  and  job-relevant  test  content.  To 
provide  a  basis  for  Interpreting  predictive  relationships  between  the  new 
comprehensive  knowledge  tests  and  subsequent  job  performance,  the  relevance 
of  the  knowledge  test  Items  to  training  and  job  content  will  be  determined 
in  two  ways.  First,  training  relevance  will  be  determined  empirically  by 
comparing  the  performance  of  entering  trainees  and  graduating  trainees;  Job 
relevance  will  be  determined  by  comparing  the  performance  of  graduating 
trainees  and  job  Incumbents.  Second,  job  relevance  and  training  relevance 
will  be  determined  judgmental ly  by  trainers  at  Army  schools. 

V  : 

Develop  predictor  and  criterion  composites  oV  school  measures.  School 
measures  determined  to  have  adequate  reliable  variance  and  content  validity 
will  be  assembled  into  Integrated  sets  to  serve  as  criteria  for  validating 


Initial  predictors  and  as  predictors  of  MOS-speciflc  and  Army-wide 
performance. 

Analyze  predictive  relationships  and  prepare  reports.  The  results  of 
analyses  of  FY83/84  and  FY86/87  cohorts  will  be  presented  In  a  technical 
repprt  and  an  Instruments  and  measures  report  shortly  after  the  nejf 
predictors  and  new  training  measures  are"  administered  to  the  FY86/87 
cohort. 

.  .  ■*  i 

Table  1  summarizes  the  subject  matter  expert  (SME)  and  test  subject  support 
requirements  for  the  above  Subtasks.  Figure  3-1,  Immediately  following, 
depicts  the  Task  3  Schedule  for  the  accomplishment  of  these  Subtasks. 


TABLE  3-1 

SME  ANO  TEST  SUBJECT  SUPPORT  REQUIREMENTS 


SME  Trainees/Incumbents 
#  per  days  per  iTper  days  per 
MOS  person  MOS  person 


Subtask  3.2  -  Evaluate  existing 
(measures. 

y 

L-? 

4*’ 

Determine  congruence  of  train¬ 
ing  content  measures  In  school 
Interviews. 

3 

.5 

['■ 

r- 

Identify  400  trainee  records  In 
FY81/82  cohort;  arrange  copying 

A  mailing  of  records  to  HumRRO 
for  both  FY81/82  and  83/84 
cohort 

2 

.5 

*»V 

I-- 

Subtask. 3. 3  -  Analysis  of  Army 
Training  A  Evaluation  Procedures 

Interview  training  developers  A 
Instructors. 

3 

.5 

u. 

•a 

Subtask  3.4  -  Construct/Administer 
comprehensive  knowledge  tests. 

i 

Estimate  error  frequency  In  task 
elements. 

3 

.5 

B 

Specify  elements  requiring  gen¬ 
eralization  of  knowledge. 

3 

.5 

.<• 

Sort  elements  according  to 
commonality. 

3 

.5 

W 

Estimate  perceptual -motor  skill 
requirements. 

3 

.5 

f « *  1 

Analyze  knowledge  requirements 
for  elements;  empirical  testing 
of  trainers  to  resolve 
dl fference. 

Try  out  knowledge  tests. 

3 

.5 

100 

.25 

l ± 

i 

»*. .  1 

*  l 

*  /  1 

! 

i 

V  •  1 

U  j 

Subtask  3.5  -  Development  of  proto¬ 
type  measures. 

j 

»■  » 

Try  out  free  response  Items  and 
synthetic  tests. 

100 

.25 

i 

i.  1 

v  1 

Develop  measures  of  general  per¬ 
formance  In  training. 

4 

1 

j 

v  i 
• "  • 
i\  i 
i 

23  V  tIJ 


f 


SHE  AND  TEST  SUBJECT  SUPPORT  REQUIREMENTS 
Continued 

-  SHE"  -  Trainee! /Incumbents 

A  per  •JfJ/v  per“  ^  "per  days  per 
■MOS  person  MGS  person 


( 

r 


f 


V; 


ft 


ft: 

& 


i 

i 


Subtask  3.6  -  Identify  tral ^ir¬ 
relevant  &  job-relevant  Items. 
Administer  Items  to  entering  , 
trainees. 

Obtain  test  scores  for  graduat¬ 
ing  trainees. 

Administer  Items  to  job  Incum¬ 
bents  In  field  tests. 

Sort  knowledge  elements  accord¬ 
ing  to  SME  judgment  of  training 
relevance. 

Subtask  3.7  -  Develop  School  Cri¬ 
terion  A  Predictor  composites. 

Select  Integrated  set  of  cri¬ 
terion  measures. 

**  Field  Grade  Officers. 

Subtask  3.8  Analyze  predictive 
relationships  A  prepare  reports. 

Obtain  school  data  for  FY86/87 
cohort. 

***  Not  applicable  since  measures 
will  have  been  operationally 
Implemented  In  school  setting. 


_  \  * 

.  . 

r  ^ 

. 

i 

100 

>25 

; 

\ 

< 

100 

,..25 

•  - .  1 

/  ' 

•  ioo 

.5 

,3 

\ 

.5 

*v 

10** 

.5 

■ 

2200  on  i 
average 

NA*** 

3 


V. 


d 


£ 


D 


3-17 


PROCEDURES 


Subtask  3,1:  Review  Literature  and  Plan 

Except  for  the  construction  of  conventional  tests  of  achievement ,  methodol¬ 
ogies  and  concepts  appropriate  to  the  analysis,  development  and  application 
of  measurement  In  training  and  work  situations  are  not  highly  developed. 
Methods  for  analyzing  task  and  knowledge  requirements  for  both  training  and  job 
performance,  methods  of -sampling  from  training  and  job  domains,  and  procedures  - 
for  classifying  performance  requirements  and  relating  them  to  test  formats 
suitable  for  producing  valid  measurement  are  some  of  the  areas  where  procedures 
are  not  well  defined.  To  benefit  from  the  most  recent  work  and 

conceptualization  In  such  areas  the  measurement,  educational,  Industrial, 
psychological  and  military  research  literature  will  be  reviewed.  Data  bases 
such  as  ERIC,  NTIS  and  RDIS  will  be  examined  and  the  following  libraries  will  be 
Included  In  the  search:  HumRRO,  CTB-McGraw  Hill,  Navy  Postgraduate  School,  and 
University  of  California.  Recent  work  that  has  not  yet  been  published  and 
reports  In  the  publication  process  will  be  sought  by  personal  communication  with 
persons  at  the  various  military  human  research  laboratories  and  other  government 
and  non-government  research  organizations. 

Major  topics  to  be  Included  In  the  review  are: 

Job,  task  and  knowledge  analysis 

Test  sampling 

Behavior  classification 

Achievement,  performance,  and  work  sample  test  development 
Performance  rating  development 


Simulation  and  synthetic  testing 

Apt  1 tude-perf ormance  relationships 

Knowledge  test-performance  test  relationships 

Training-job  performance  relationships 

Individual  differences  In  training  and  job  performance 

Test  bias  _  _  _  _  .  -  —  - ...  - 


This  subtask  will  also  Include  preparation  of  the  Task  3  draft  research 
plan  and  draft  master  plan.  Following  ARI  review  of  the  draft  plans,  the 
revised  plans  will  be  submitted. 

Subtask  3.2:  Evaluation  of  Existing  Measures 

The  primary  purpose  of  this  subtask  Is  to  determine  whether  recent  and  cur¬ 
rent  training  measures  can  serve  as  (a)  criteria  for  prior  selection  and 
classification  measures,  and  (b)  predictors  of  subsequent  job  performance. 

First,  an  examination  will  be  made  of  current  measures  In  the  ARI,  the 
Enlisted  Master  File  (EMF),  and  the  TREDS  files.  Second,  visits  will  be 
made  to  selected  schools  to  review  the  measuring  Instruments  used  to  obtain 
these  scores,  review  any  additional  measures  formally  recorded  by  schools, 
and  Identify  measures  not  formally  recorded  but  temporarily  retained. 
Scores  from  measures  deemed  adequate  will  be  added  to  the  records  of  the 
FY81/FY82  cohort  In  the  LROB  and  evaluated  as  criteria  for  such  Initial 
predictors  as  ASVAB.  The  scores  from  these  measures  will  also  be  added  to 
the  LROB  for  the  FY83/84  cohort  where  they  can  be  evaluated  both  as  crite¬ 
ria  and  as  predictors. 


Evaluation  of  FY81/82  cohort  data  and  selection  of  MOS.  A  compilation 
Mill  be  made  with  the  aid  of  Task  1  staff  of  all  school  performance 
measures  available  in  the  ARI,  the  EMF,  and  TREDS,  for  the  selected  MOS. 
The  LR08  shows  the  date  of  enrollment  In  the  course  and  whether  an  MOS  was 
awarded  and  at  what  skill  level  (presumably  Skill  Level  1)  and,  If  not,  the 
reason  for  .attrition,  _the  disposition  of  the  student  (such  as  "recycled"  or 
"early  graduation"),  and  its  effective  date,  as  well  as  course  grade  and 
class  rank.  What  relevant  Information  Is  available  In  the  EMF  and  TREDS 
has  not  yet  been  determined. 

Collection  of  data  for  qualitative  analysis.  Among  the  MOS  that  have  been 
selected  for  Initial  study  In  Project  A  are: 


MOS 

Title 

Training  Site 

05C 

Radio  TT  Operator 

Ft.  Gordon,  GA 

638 

Vehicle  4  Generator  Mechanic 

Ft.  Dlx,  NJ 

Ft.  Leonard  Wood,  MO 

71L 

Administrative  Specialist 

Ft.  Jackson,  SC 

958 

Military  Police 

Ft.  McClellan,  AL 

Each  of  these  posts  Is  the  training  site  for  several  of  the  larger  MOS  in 
the  FY81/82  cohort.  Three  other  posts,  Ft.  Bliss,  TX,  Ft.  Sill,  OK,  and 
Ft.  Sam  Houston,  TX,  are  also  training  sites  for  a  number  of  the  larger  MOS 
and  are  close  enough  to  the  first  five  to  minimize  travel  costs.  We  plan, 
therefore,  to  visit  at  least  these  eight  posts  to  collect  quantitative 
information  about  the  training  measures  that  generated  the  FY81/82  cohort 


Using  an  ARI-furnlshed  printout,  the  courses  taught  at  these  eight  posts 
that  are  Included  In  the  FY81/82  cohort  have  been  Identified,  and  the 
number  of  records  In  the  data  base  were  tabulated.  The  number  of  records 
range  from  1,672  for  the  91B  Medical  Specialist  course  at  Ft.  Sam  Houston 
to  10  for  the  32H  Fixed  Station  Radio  Repairer  course  at  Ft.  Gordon,  GA. 
We  expect  shrinkage  In  the  data  base  as  the  Accessions  File,  TREDS,  and  EMF 
are  merged  with  the  LROB.  In  addition,  some  partitioning  of  the  available 
records  will  occur  since  only  a  majority,  not  all,  of  the  ASVAB  scores  In 
the  Accessions  File  are  based  on  the  equivalent  forms  8  or  9  or  10. 
(Scores  based  on  earlier  forms  of  the  ASVAB  are  not  equivalent  to  the 
8-9-10  set  and,  therefore,  for  certain  analyses  the  training  records  for 
persons  with  ASVAB  from  the  earlier  forms  will  have  to  be  treated  as  a 
separate  subset.) 

For  these  reasons,  most  courses  with  fewer  than  500  records  In  the  LRDB 
were  removed  from  further  consideration.  The  courses  with  500  or  more 
records  are  listed  In  Table  3-2.  It  should  be  noted  that  Task  1  has 
Informed  us  that  the  files  have  not  yet  been  cleared  of  duplicate  data, 
l.e.,  one  subject's  data  repeated  several  times.  Once  the  file  has  been 
checked  and  cleared  of  mispunched  data,  the  number  of  records  within  each 
MOS  seem  likely  to  change. 

Several  additional  courses  are  currently  under  consideration  even  though 
the  number  of  records  in  the  LROB  Is  less  than  500.  These  are  courses  that 
have  been  Identified  by  ARI  as  having  a  good  distribution  on  the  training 
measures  l.e.,  sufficient  variance  to  suggest  that  useful  differentiation 
among  trainees  was  made.  These  courses  are  located  at  Ft.  Bliss,  TX, 


5 


w 


>  >. 
C'C 


h* 


3-22 


Table  3-2 


MOS  with  500  Records  In  ARI 
Data  Base 


MOS 

Title 

Records 

Training  Site 

u 

05B 

Radio  Operator 

797 

Ft.  Gordon,  GA 

m  . 

05C 

Radio  Teletype  Operator 

1046 

Ft.  Gordon,  GA 

13E 

Cannon  Fire  01  recti on  Specialist 

1164 

Ft.  Sill,  OK 

13F 

Fire  Support  Specialist 

1133 

Ft.  Sill,  OK 

■  •* 

150 

LANCE  Crewmember 

657 

Ft.  Sill,  OK 

-  --  -  - 

15E 

PERSHING  Missile  Crewmember 

638 

Ft.  Sill,  OK 

16B 

HERCULES  Missile  Crewmember 

607 

Ft.  Bliss,  TX 

16R 

AOS  Short  Range  Gunnery  Crewman 

939 

Ft.  Bliss,  TX 

rv 

'  v 

16S 

MANPADS  Crewman 

887 

Ft.  BUSS,  TX 

31M 

Multichannel  Communications  Equipment  Operator 

1301 

Ft.  Gordon,  GA 

•7* 

31V 

Tactical  Communications  Systems  Operator 

1133 

Ft.  Sill,  OK 

£*: 

320 

Station  Technical  Controller 

513 

Ft.  Gordon,  GA 

36C 

Hire  System  Instal ler/Operator 

590 

Ft.  Gordon,  GA 

i 

36K 

Tactical  Wire  Operations  Specialist 

1304 

Ft.  Gordon,  GA 

51B 

Carpentry  and  Masonary  Specialist 

588 

Ft.  Leonard  Wood,  MO 

f-r 

r» 

51R 

Electrician 

573 

Ft.  Leonard  Wood,  MO 

.V . 

54E 

NBC  Specialist 

573 

Ft.  McClellan,  AL 

s 

62B 

Construction  Equipment  Repairer 

1155 

Ft.  Leonard  Wood,  MO 

•> 

62E 

Heavy  Construction  Equipment  Operator 

1081 

Ft.  Leonard  Wood,  MO 

;v 

62F 

Lifting  and  Loading  Equipment  Operator 

1018 

Ft.  Leonard  Wood,  MO 

%*«• 

«v 

63B 

Light  Weight  Vehicle/Power  Generation  Mechanic 

990 

Ft.  Dlx,  NJ 

63B 

Light  Weight  Vehicle/Power  Generation  Mechanic 

1680 

FT.  Jackson,  SC 

•>. 

72E 

Telecommunications  Center  Operator 

962 

Ft.  Gordon,  GA 

750 

Personnel  Records  Specialist 

606 

Ft.  Jackson,  SC 

75E 

Personnel  Actions  Specialist 

950 

Ft.  Jackson,  SC 

76Y 

Unit  Supply  Specialist 

1262 

Ft.  Jackson,  SC 

;  . 

82C 

Field  Artillery  Surveyor 

927 

Ft.  Sill,  OK 

91B 

Medical  Specialist 

1672 

Ft.  Sam  Houston,  TX 

r* 

91C 

Patient  Care  Specialist 

1204 

Ft.  Sam  Houston,  TX 

b 

91E 

Dental  Specialist 

553 

Ft.  Sam  Houston,  TX 

. 

94B 

Food  Service  Specialist 

535 

Ft.  Dlx,  NJ 

L~’t 
»  •/ 

948 

Food  Service  Specialist 

1298 

Ft.  Jackson,  SC 

9. 


3-23 


currently  Identified  as  one  of  the  eight  locations  already  proposed,  and  at 
Ft.  Eustls,  VA. 


Interview  procedures  and  qualitative  analysis.  Visits  will  be  made  to  the 
training  sites  to  examine  the  measures  reported  In  the  computerized  files 
and  to  Identify  any  other  school  measures  that  might  predict  school  success 
or  subsequent  job  performance  or  serve  as  criteria  for  the  Initial  predic¬ 
tors.  During  our  visits  to  the  schools,  measures  provided  for  qualitative 
analysis  will  first  be  classified  Into  paper-and-pencll ,  hands-on  (perfor¬ 
mance)  and  other  measures  (Instructor  ratings  of  training  performance, 
number  of  class  hours/days  needed  to  complete  course,  number  of  times 
recycled,  etc.). 


The  number  of  measures  falling  in  the  first  or  even  the  second  category  may 
be  quite  large.  For  example,  weekly  or  even  dally  spot  quizzes  (paper-and- 
pencll)  or  spot  checks  (hands-on)  might  be  given  in  some  courses.  At  a 
minimum  we  expect  to  find  a  measure  following  each  training  module  or 
objective,  depending  upon  the  course.  When  large  numbers  of  measures  are 
taken,  we  may  have  to  sample  them,  taking,  for  example,  the  first  dally 
quiz  and  every  fifth  quiz  thereafter.  A  similar  sampling  plan  can  be 
developed  for  other  measures  available  In  the  school  records  or  temporarily 
held  by  Instructors.  Prior  coordination  with  the  schools  will  facilitate 
development  of  such  a  plan. 


For  each  course  we  will  examine: 


(1)  A  syllabus  with  course  objectives,  course  schedule, 
lesson  outlines. 


3-24 


(2)  Copies  or  descriptions  of  current  measures.  Including 
Instructions  for  administration  and  scoring. 

(3)  Test  scores  of  the  FY81/82  cohort. 

(4)  Any  known  statistical  properties  of  the  test  measures. 

(5)  Copies  of  selected  Individual  records  from  each  of  a 
number  of  courses. 


The  following  questions  are  representative  of  the  kinds  of  Information  to 
be  derived  In  Interviews  with  Instructors  and  training  managers. 


A.  Characteristics  of  Measures 


1.  What  training  objectives  or  sub-objectives  Is  this  test 
Intended  to  measure?  What  portion  of  the  course  does  this 
test  cover? 

2.  Why  was  the  particular  format  (pencll-and-paper,  hands-on) 
chosen? 

3.  Oo  the  Individual  Items  match  the  elements  of  training 
content  within  the  objective  or  sub-objectives? 

4.  How  were  the  Items  generated? 

5.  Is  there  a  pool  of  Items?  How  were  the  Items  sampled  from 
the  pool? 

6.  Is  the  Item  sequence  reasonably  ordered,  e.g.,  one  that 
reflects  the  normal  sequence  of  performance? 

7.  What  item  format  is  used? 

a.  True-false 

b.  Matching 

c.  Multiple-choice 

d.  Ordering  sequence 

e.  Identifying  right  (or  wrong)  procedures 

f.  Open  ended  response  or  completion 

g.  Rank  ordering  Importance 

h.  A  mix  of  formats 


8.  Which  of  the  following  kinds  of  behavior  are  required  to 
master  the  objective? 

a.  Decision-making 

b.  Application  of  rules 

c.  Selection  of  strategies 


d. 

e. 

f. 
9- 

h. 

1. 

J. 

k. 

l. 
~  m. 

n. 


Troubleshooting 
Problem  solving 
Sustained  vigilance 

Immediate  (automatized)  response  to  prevent  Injury  to 
personnel  or  damage  to  equipment 
Attention  to  fine  detail 
Motor  skill 

Perceptual -motor  skill 
Speeded  response 
Unusual  strength 
-  Unusual  endurance 
Other 


B.  Administration  of  Measures 


1.  Is  the  test  open  book  or  closed  book? 

2.  Is  the  test  closely  or  loosely  proctored? 

3.  Is  the  administration  standardized? 

a.  Is  testing  carried  out  at  a  central  facility? 

b.  How  are  test  administrators  trained? 

c.  Are  the  Instructions  given  ad  lib  or  read  by  the 
administrator? 

4.  Are  equivalent  forms  available? 

5.  What  are  the  procedures  to  keep  the  test  secure? 

6.  Are  questions  permitted  during  the  test? 


C.  Scoring  of  Measures 

1.  Is  there  any  subjectivity  In  scoring? 

2.  Is  the  decision  to  Pass-Fall  on  the  objective  covered  by 
the  test 

a.  criterion  referenced  (pre-set  criterion,  absolute),  or 

b.  norm  referenced? 

3.  Is  testing  time  part  of  the  measure?  Do  some  finish  or  all 
finish?  If  only  some,  what  Is  the  proportion  finishing? 

4.  What  Is  the  contribution  of  the  measure  to  pass-fall  on  the 
course? 


5.  How  Is  the  pass-fall  decision  made  at  the  end  of  the  course? 


1 


o 


u 


0.  Statistical  Properties  of  Measures,  If  known 
1.  Reliability  (stability). 

-  — 2.  Internal  consistency. -  - 

3.  Sensitivity  to  training. 


4.  Mean  differences  between  racial  or  gender  groups.  Differ¬ 
ences,  If  any,  to  be  analyzed  to  determine  source  and  provide “ 
basis  for  corrective  action. 

5.  Validity. 


Similar  questions  will  govern  the  analysis  of  the  hands-on  measures, 
with  certain  additions.  For  example,  (a)  Is  the  performance  process-  or 
product-scored?  What  were  the  reasons  for  the  choice?  (b)  Is  cuing 
allowed?  What  are  the  rules?  (c)  Is  role-playing  on  the  part  of  someone- 
else  required  to  carry  out  a  particular  performance?  Who  plays  the  role 
and  how  Is  this  person  trained?  What  artificialities  occur  as  a 
consequence  of  cuing?  As  a  result  of  modified  task  boundaries? 


Examples  of  criteria  for  the  qualitative  assessment  of  measures  are  given 
below. 

1.  The  congruence  between  test  Items  and  training  content  for  a  specific 
objective.  The  number  of  teaching  points  in  the  lesson  outlines 

approximates  the  required  coverage  of  a  measure.  When  each  test  Item 

Is  classified  according  to  the  teaching  point  It  represents,  redundan¬ 
cies,  contaminants  and  deficiencies  In  the  measure  can  be  Identified. 

An  Index  of  congruence  can  be  defined  as: 

3-27 


I 


Total  Number  of  Test  Items  -  Number  of  Redundant  Items 

_ -  Number  of  Contaminants _ 

'  Number  o^  Teaching  Points 

For  example,  the  test  for  a  module  of  a  course  covered  by  50  teaching 

points  contains  45  Items,  twelve  of  which  are  redundant  and  3  of  which 

are  Irrelevant  to  the  module  (contaminants).  The  index  of  congruence 

Is  then:  45-12-3  -  .60 

- “51 T 

-  (The  appropriateness  of  a  given  level  of  congruence  will  depend  on  the 
type  of  content  being  assessed.) 

2.  Test  administration  procedures.  Points  will  be  assigned  for  “favor¬ 
able"  responses  to  the  questions  on  test  administration  (B  1-6  above) 
and  a  total  score  generated  that  Indicates  the  relative  adequacy  of  a 
procedure.  Whether  a  response  Is  "favorable"  or  not  will  depend  upon 
the  characteristics  of  the  test.  For  example,  a  closed-book  test  would 
ordinarily  be  given  a  point,  unless  It  happens  to  cover  a  procedure 
that  could  be  carried  out  on  the  job  with  the  aid  of  a  TM  or  some  other 
job  aid. 

The  remaining  qualitative  analyses  will  proceed  along  similar  lines.  The 
criteria  and  methods  of  assigning  scores  to  the  various  descriptors  will  no 
doubt  be  modified  and  improved  after  reviewing  actual  school  measures 
during  the  pilot  administration  of  the  Interview  procedures.  The  purpose 


of  these  qualitative  analyses  Is  to  Identify  measures  that  are  likely  to 
have  sufficient  re1>»Afi,t/  and  validity  to  warrant  being  added  to  the 
FY81/82  cohort  dat  i  re  o  b-  collected  from  the  FY83/84  cohort,,  as  well  as 
to  Identify  measure  .i.«v  could  be  improved  to  Increase  their  reliability 
and  validity.  In  addition,  some  of  the  Information  obtained  In  the  quail* 
tatlve  analyses  will  be  provided  to  Task  2  for  use  In  their  analysis  of 
predictor  and  criterion  constructs.  Descriptions  of  types  of  behavior 
believed  to  be  requlred-to  master  training  objectives,  as  Identified  In 
Interviews  with  Instructors  and  training  managers  will  be  particularly 
useful  In  the  Technical  Review  conducted  by  Task  2  staff  aimed  at  Identify* 
Ing  the  best  predictor  set.  The  pertinent  training  behaviors  will  be 
defined,  classified  by  objective  and  training  situation  (e.g.,  lock-step, 
self-paced  Instruction)  and  provided  to  Task  2  with  description  of  the  data 
collection  procedure. 

The  outcomes  of  the  qualitative  analyses  will  determine  what  new  measures 
will  be  added  to  the  LRDB  for  quantitative  analysis.  The  outcomes  may 
also  cast  doubt  on  some  measures  currently  in  the  data  base.  For  example, 
a  tlme-to-course-completlon  measure  may  not  have  Included  extra  or  after- 
hours  study  time.  Or  the  administration  of  an  end-of -course  comprehensive 
test  (EOCCT)  may  be  so  poorly  standardized  as  to  vitiate  the  scores. 

Collection  of  records  for  quantitative  analysis  for  FY81/82  cohort.  Since 
the  costs  associated  with  reproduction,  editing,  and  entering  all  available 
records  may  be  prohibitive  In  some  MOS,  printouts  will  first  be  obtained  of 
the  Individual  records  of  the  enlisted  personnel  currently  on  file  In  the 
LRDB  for  the  MOS  selected.  From  the  printouts  a  sample  of  up  to  400  will 


then  be  selected  for  each  MQS,  taking  into  account  the  date  of  ASVAB 
testing,  sex,  and  ethnicity.  The  school  records  of  these  400  will  then  be 
Identified  and  copied.  Should  Initial  contact  with  the  schools  make  It 
apparent  that  such  an  approach  Is  not  feasible  (that  Is,  If  searching 
for  specific  records  Is  too  cumbersome)  400  records  will  be  sampled 
sequentially  from  all  81/82  classes  and  sorted  later  for  date  of  ASVAB, 
sex ,  and  e thn 1  cl ty . 

Collection  of  records  for  FY83/84  cohort.  Currently  there  are  no  data  In 
the  LRDB  for  the  FY83/84  cohort.  The  measures  determined  by  the  qualita¬ 
tive  analyses  to  be  most  promising  will  be  Input  to  the  LRDB.  Arrangements 
will  be  made  with  schools  to  send  us  records  on  a  continuing  basis  as  each 
new  class  completes  training,  starting  In  July  83.  Records  for  every 
trainee  entering  a  course  will  be  requested  to  Increase  the  likelihood 
that  the  number  of  records  Is  sufficient  to  conduct  the  follow-up  data 
collections  during  the  first  and  second  tours  when  considerable  attrition 
can  be  expected.  Schools  will  be  requested  to  include  additional  measures 
In  school  records  that  appear  promising  on  the  basis  of  the  qualitative 
analysis  but  which  were  not  Included  In  the  data  collected  by  ARI  for  the 
FY81/82  cohort.  Also,  emphasis  will  be  given  to  measures  which  appeared 
qualitatively  acceptable  for  the  FY81/82  cohort  but  which  were  found  so 
Incomplete  In  the  data  base  that  no  quantitative  analyses  were  done. 

In  addition,  If  any  of  the  19  MOS  selected  for  Project  A  are  not  In  the 
group  of  MOS  selected  for  the  FY81/82  cohort,  the  measures  for  such  MOS 
will  be  assessed  qualitatively  and  quantitatively.  That  Is,  the  analyses 
performed  for  the  FY81/82  cohort  measures  will  be  repeated  for  any  MOS 


0 


m 

& 


c,_ 

ti 


C 


ir-* 
k  ■* 

fe* 


h  m' 


•r. 


3-30 


( 


>V 


Ift 


& 


> 


/ 

not  In  our  PY81/82  set.  Just  as  before,  this  will  entail  visits  to  schools 
to  become  knowledgeable  about  their  measures  arid  to  obtain  records  of 
trainee  scores. 

\  •  .  .  ■  .  .... 

~\  1  r'  '  '  *  -  ‘  ••  •  1  • 

Quantitative  Analysis.  The  quantitative  analysis  of  the  existing  measures 
will  focys  on  the  adequacy  of  the  distributions  of  scores  ,  from  these 
^measures  tw  support  the  statistical  analysis  projected  for  them.  Clearly, 

i  ■  ■  .  '  -  -  ■ 

the  number  of  scores  available  Is  one  Important  criterion  for  deciding 

,  i 

Whethera  set,  of  scores  is  a  suitable  basis  for  validation  and  prediction 
research;  the  extent  to  Which  the  measures  appear  to  discriminate  among 
the 'trainees  Is  clearly  another,  but  there  are  no  convenient  rules  of  thumb 
for  accepting  or  rejecting  a  distribution  on  either  criterion.  It  Is 
■  necessary  to  devise  some  scheme  whereby  distributions  of  scores  currently 
available  a*id  to  be  encountered  In  the  future  can  be  classified  according 
to  their  potential  utility  for  further  analyses. 

To  this  erb,  a  set  of  some  2b  distributions  of  scores  will  be  selected  out 
of  tfie  data  from  the  FY81/82  cohort  (once  those  data  have  been  purged  of 
redundancies,  drops,  and  recycles)  and  out  of  the  data  from  other  measures 
currently  in  use.  Such  Indices  of  central  tendency,  variability,  skewness, 
and  kurtocls  as  the  mean,  median,  mode,  minimum  and  maximum  scores  and 
their  standard  score  equivalents,  score  range,  score  variance,  standard 
deviation,  seml-lnterquaretlle  range,  coefficient  of  variation,  Pearson's 
second  coefficient  of  skewness,  a  measure  of  kurtosls,  the  quartlle  scores 
ond  the  scores  at  V-  1  standard  deviation  will  be  determined  for  each 
distribution.  In  addition,  a  histogram  wi 11  be  made  of  each  distribution. 
A  total  of  15  to  20  experts  In  test  construction  and  statistical  analysis 


3-31 


will  be  selected  from  the  staff  of  Project  A,  and  asked  to  rate  each  of 
these  distributions  and  their  associated  statistics  on  a  five-point  scale 
as  very  useful,  useful,  marginally  useful,  doubtfully  useful,  or  not  useful 
In  the  statistical  analyses  Involved  In  validation  and  prediction. 

Once  the  distributions  have  been  rated  by  the  experts,  analysis  will  show 
what  characteristics  of  a  distribution  can  conveniently  be  used  for 
accepting  or  rejecting  It  as  a  basis  for  further  analyses,  and  these 
characteristics  will  be  used  as  the  basis  for  accepting  or  rejecting 
existing  measures  to  be  retained  In  the  LRDB. 

As  FY83/84  cohort  data  are  Input  Into  the  LRDB,  descriptive  statistics 
(means,  variances,  scatter  plots,  etc.)  and  correlation  matrices  will  be 
computed  to  determine  whether  some  measures  should  be  dropped  from  further 
consideration.  The  descriptive  statistics  will  be  used  to  screen  such  ob¬ 
viously  Inadequate  measures  as  those  with  no  variance. 

We  will  stay  In  touch  with  the  schools  to  Identify  changes  made  In  lesson 
outlines  and  school  measures  during  the  data  collection  period  for  the 
FY83/84  cohort.  Changes  In  course  outlines  or  training  measures,  such  as 
adding  or  deleting  blocks  of  material,  or  altering  the  match  between 
training  and  test  content,  could  seriously  affect  the  comparabl llty  of 
measures  between  classes.  (This  will  be  a  continuing  problem  within  and 
between  cohorts.) 

Certain  of  the  school  measures  that  were  input  to  the  LRDB  originally  from 
the  AR1  files  or  added  during  this  study  will  no  longer  be  of  Interest. 
These  will  Include:  (a)  measures  with  intractable  shortcomings  uncovered 


**  •« 


V. 


S3 


i 

f-- 
•  - 

i 


P 

fc: 

£ 


£ 


fc 

£ 

A-„ 

K 


3-32 


In  qualitative  analysis,  e.g.,  measures  that  appear  Impossible  to  standard¬ 
ize;  (b)  measures  that  fall  to  differentiate  among  Individuals,  and 
(c)  measures  that  entered  the  correlation  matrix  but  appeared  to  be  mini¬ 
mally  related  to  anything  else. 

On  the  basis  of  the  qualitative  and  quantitative  analyses  of  existing 
measures,  we  will  prepare  a  report  listing  our  recommendations  for 

Improving  the  administration  and  scoring  of  measures  In  each  of  the  19  MOS 

In  Project  A.  This  report  will  be  forwarded  to  the  school  commandant  and 
the  director  of  training  at  each  site,  upon  approval  of  the  ARI  COR. 

With  the  approval  of  the  director  of  training,  we  will  brief  each  recom¬ 
mendation  to  the  instructors  and  course  chief  Involved,  and  describe 

specifically  each  modification  that  we  would  like  to  see  Implemented.  In 
particular,  our  briefing  to  the  Instructors  will  explain  the  rationale 
for  each  change  In  terms  of  expected  improvement  In  reliable  and  valid 
Information  to  be  obtained  about  each  trainee's  level  of  mastery  of  course 
objectives. 

Subtask  3.3;  Analysis  of  Army  Training  and  Evaluation  Procedures 

An  essential  aspect  of  the  development  and  Improvement  of  school  measures 
Is  the  determination  of  the  relation  of  these  measures  to  actual  job 
requirements.  Even  more  fundamental  Is  the  relation  of  the  content  of 
training  and  Its  measures  to  the  requirements  of  the  job  Itself.  The 

primary  purpose  of  this  subtask  Is  to  determine  the  content  validity  of 
current  training  and  training  measurement  by:  (a)  Identifying  the 


processes  by  which  these  components  of  the  training  system  are  derived,  and 
(b)  establishing  their  relation  to  job  content.  The  general  method  by 
which  this  will  be  done  will  be  to  track  the  requirements  Identified  by  job 
analysis,  through  the  training  development  process,  to  their  representa- 
tlon/nonrepresentatlon  on  wlthln-course  and  end-of-course  evaluations. 

In  the  Ideal  training  development  system,  there  would  appear  a  simple 
one-to-one  correspondence  between  job  requirements  at  one  end  of  the 
development  process,  and  the  content  of  trainee  evaluation  at  the  other. 
Experience  with  attempts  to  reach  this  Ideal  has  demonstrated  that  such  a 
correspondence  Is  difficult  to  achieve  and  not  likely  to  be  typical  of  the 
real-life  training  process.  In  practice,  the  content  of  training  can  arise 
from  several  sources  In  addition  to  Identifiable  job  requirements  (past 
practices,  command  preferences,  rational  analysis,  instructor  proclivity, 
tradition,  etc.).  The  focus  of  this  subtask  will  not  be  a  judgmental 
comparison  between  MOS  as  to  the  quality  of  training  development  efforts, 
but  rather,  a  determination  of  how  the  content  of  training  and  the  content 
of  trainee  evaluation  come  Into  being. 

Tracking  the  progress  of  training  development  through  Its  various  stages 
requires  on  site  Interviews  with  training  developers  themselves  and 
examination  of  training  development  products.  In  this  subtask,  semi- 
structured  interviews  will  be  used  to  collect  Information  across  Individual 
courses.  Interviewers  will  follow  a  guide,  but  they  will  be  able  to 
deviate  Into  promising  areas  of  Inquiry.  By  using  this  method,  It  Is 
possible  to  locate  the  source  of  training  content  as  well  as  the  eventual 
destination  of  training  development  products  (e.g.,  training  objectives, 


training  activities,  training  measures).  By  Interviewing  persons  who  are 
responsible  for  different  parts  of  the  training  development  process.  It  Is 
possible  to  obtain  Information  about  the  adequacy  of  the  output  of  one 
phase  for  serving  as  the  Input  to  the  next  phase,  and  to  piece  together  a 
picture  of  the  whole  process  as  It  actually  occurred  for  a  given  course. 

A  recurring  problem  In  tracing  the  relation  of  job  requirements  to  training 
activities  and  measures  should  be  mentioned.  A  common  component  of  the 
training  development  process  Is  the  specification  of  training  objectives  as 
an  Intermediate  step  In  the  derivation  of  training.  We  have  found 
(Vlneberg  &  Joyner,  1980)  that  while  the  specification  of  training  objec¬ 
tives  Is  virtually  universal,  the  procedures  used  to  Identify  objectives 
are  highly  variable  and  frequently  unclear.  There  Is  evidence  that  objec¬ 
tives  are  often  prepared  after  the  fact  and  are  derived  from  training 
content  rather  than  used  to  generate  It.  Where  records  are  maintained, 
formats  for  displaying  the  relation  between  tasks  and  training  objectives 
often  make  It  hard  to  determine  what  objectives  have  been  derived  from  a 
given  task.  That  Is,  tasks  are  often  listed  by  objective,  rather  than 
objectives  by  task. 

Accordingly,  we  plan  to  "map"  the  derivation  of  training  development  pro¬ 
ducts-^. g.,  training  requirements,  training  objectives,  training  activi¬ 
ties,  and  training  measure  content--w1th1n  a  matrix  that  cross-references 
these  products  by  job  task  requl rements.  To  accomplish  this  mapping,  the 
analysis  will  treat  the  various  aspects  of  the  training  development  process 
Independently. 


3-35 


Job  Requirements  Analysis.  Personnel  who  know  about  the  development  of  the 
course  will  be  Interviewed,  and  job/task  analysis  data  examined,  to 
determine  the  following: 

o  how  job  requirements  were  Identified  or  defined; 

o  how  training  requirements  were  Identified  or  defined; 

-  o-  how  training  requirements  were- derived  from  job  requirements;  and 

o  how  training  requirements  were  specified/transmitted  to  training 
designers/developers. 


Of  special  concern  at  this  point  Is  the  question  of  what  happens  to  job/ 
task  analysis  Information  after  It  has  been  developed. 

Instruction  Developers.  Personnel  who  design  and  develop  training  will  be 
Interviewed,  and  records  of  training  design  decisions  will  be  examined  to 
determine:  (a)  how  job/task  analysis  Information  Is  put  to  use;  (b)  how 
the  content  of  lesson  plans,  texts,  and  other  media  of  Instruction  Is 
selected;  (c)  what  formal  specifications  are  given  to  training  developers, 
etc.  (Training  developers  may  be  the  trainers  themselves,  or  other 
personnel.  Even  when  schools  are  organized  Into  separate  divisions  to 
carry  out  different  training  development  functions,  considerable  functional 
overlap  has  been  found.)  In  this  subtask,  questions  will  concentrate  on 
how  Instruction  content  Is  derived,  developed,  and  evaluated  rather  than 
how  decisions  about  Instruction  technique  are  made,  e.g.,  media,  methods, 
self-paced  vs.  group-paced,  even  though  information  about  the  latter  will 
undoubtedly  emerge  In  the  course  of  Interviews. 


3-36 


Test  developers.  Developers  of  evaluation  Instruments  will  be  Interviewed, 
and  evaluation  Instruments  examined,  to  determine  how  the  content  of 
trainee  evaluation  Is  selected.  Of  special  concern  at  this  point  is  how 
the  domain  to  be  sampled  during  criterion  measurement  has  been  determined. 
Are  tests  based  on  training  content  (as  represented  In  texts,  lesson  plans, 
and  other  media)..  Job/task  analysis  Information,  or  some  other  source?  _  Are 
tests  constructed  by  the  Instructors  who  conduct  training,  or  by  other 
personnel? 


Trainers'  descriptions  of  the  output  from  and  relationships  among  compo¬ 
nents  of  the  training  development  process  will  be  confirmed  by  examining 
lesson  plans,  texts,  tests  and  other  developmental  products.  Each  job  task 
selected  for  training  will  be  classified  as  represented  or  not  represented 
In  training  and  training  measurement. 

The  Information  collected  In  this  subtask  will  be  used  subsequently  to 
Interpret  predictive  relationships  that  are  obtained  between  measures  of 
training  performance  and  measures  of  MOS-specIflc  performance.  In 
addition,  the  Information  collected  will  be  used  in  the  development  of  the 
knowledge  tests  and  prototype  measures  (Subtasks  3.4  and  3.5). 

Subtask  3.4:  Construction /Revision  of  hew  Comprehensive  Knowledge  Tests 

Improved  knowledge  tests  to  serve  as  criterion  measures  of  school  perfor¬ 
mance  and  to  provide  a  basis  for  analyzing  the  mechanism  by  which  school 
performance  Is  predictive  of  subsequent  job  performance  will  be  developed 
in  each  of  the  selected  19  MOS.  Test  items  will  be  derived  from 


3-37 


requirements  of  tasks  specified  in  Soldier's  Manuals  for  which  task 
performance  analysis  (TPA)  information  Is  available.  The  knowledge  tests 
will  contain  subsets  of  Items  reflecting  training  requirements  and  job 
requirements  In  situations  where  it  is  appropriate  to  make  this  distinc¬ 
tion.  In  some  MOS  the  content  of  training  and  the  requirements  of  a  job 
will  be  more  or  less  Identical;  In  others,  Information  acquired  In  training 
may  not  be  retained  because  It  is  less  relevant  to  job  performance.  Where 
a  sufficient  number  of  test  Items  can  be  developed  for  both  classes  of 
Information,  we  believe  it  will  be  possible  to  provide  a  more  accurate 
estimate  of  the  role  of  an  Individual's  capacity  to  learn  versus  his  or  her 
acquisition  of  job-specific  Information,  as  a  predictor  of  job  performance. 

Since  It  will  be  efficient  to  use  test  Items  that  are  currently  used  In 
Artny  schools,  wherever  possible,  the  plan  for  test  development  calls  for 
using  existing  test  Items  whenever  they  are  content  valid  (as  determined  In 
Subtask  3.2)  and  psychometrlcally  adequate. 

Test  Item  development  will  begin  for  a  subset  of  6  of  the  19  MOS  (05C,  13B, 
19E/K,  63B,  71L,  and  95B)  Immediately  after  approval  of  this  Research  Plan 
and  the  first  Troop  Support  Request  is  received.  Test  development  will 
occur  for  two  groups  of  6  MOS  and  one  of  7  MOS  on  a  staggered  basis  during 
the  approximate  period  October  1983  -  December  1985.  The  major  steps  In 
the  construction  of  Improved  test  Items  and  their  assembly  Into  end-of- 
course  comprehensive  tests  are  as  follows: 


3-38 


Obtain  TPA  Information  for  each  task  in  the  Soldier's  Manual  for  each 
selected  MOS.  TPA  Information  will  be  obtained  from  the  analyses 
conducted  by  RCA  (Contract  NO.  OABT  60-81-C-0017 )  and  supplemented  by 
other  task  analysis,  information,  as  required,  from  Army  schools,  from 
SQT  notices  and  tests  from  Task  5,  from  job  publications  such  as  field 


manuals  and  technical  manuals,  and  from  analyses  performed  to  develop 
Training  Extension  Course  (TEC)  lessons.  Preliminary  Inspection  of  a 
small  sample  of  the  RCA  TPA  is  encouraging,  and  they  are  expected  to  be 
the  most  reliable  source  of  task  Information  for  the  MOS  not  treated  In 
Task  5. 


Subject  TPA  Information  for  each  task  to  the  following  analyses: 


a.  Three  SME  Independently  estimate  frequency  of  errors  in 
performance  (5-polnt  scale)  for  each  behavioral  element 
of  TPA. 

b.  Three  SME  Independently  designate  elements  of  behavior  In 
TPA  requiring  application  of  a  rule  or  principle  and  thus 
Involving  generalization  of  Information  -  e.g.,  how  to 
place  antenna  for  best  reception.  Discrepancies  resolved 
by  research  staff. 

c.  Research  staff  sort  elements  of  TPA  into  common  and  non- 
conmon  behaviors  across  tasks  to  Identify  generality  of 
elements.  Basis  of  assigning  commonality  and  level  of 
behavioral  description  In  TPA  at  which  It  can  be  speci¬ 
fied  to  be  determined  when  TPA  data  become  available. 

d.  Two  SME-research  staff  teams  independently  judge  presence 
or  absence  of  application  of  perceptual  and/or  motor 
skill  In  performance  of  each  behavioral  element  of  TPA. 
Those  elements  judged  to  contain  these  requirements 
become  candidates  for  hands-on  or  prototype  measures  (see 
Subtask  3.5).  In  those  instances  described  above  when 
perceptual  motor  behavior  calls  for  using  a  hands-on 
test,  conventional  knowledge  test  items  will  also  be 
developed  to  permit  examination  of  the  relationships 
between  hands-on  and  paper-and-pencll  test  measurement  of 
the  same  job  performance  component. 


On  the  basis  of  Information  generated  In  Step  2,  and  for  the  25  percent 
of  TPA  elements  with  highest  error  ratings,  compute  the  proportion  of 
procedural  elements  to  total  elements  (elements  involving  application 
of  rules/principles  [generalization  of  information]  plus  number  of 
procedural  elements).  Repeat  the  computation  for  the  25  percent  of  the 
elements  with  the  greatest  commonality.  Compute  the  mean  proprotion  of 
procedure  application  to  total  elements  for  the  two  domains.  This 
average  proportion  provides  a  basis  for  determining  the  selection  ratio 
of  procedure  to  rule/principle  Items  In  the  tests.  This  procedure  has 
been  devised  on  the  supposition  that  procedural  elements  may  account 
for  the  bulk  of  performance  requirements  while  errors  may  be 
disproportionately  associated  with  rule/principle  application 
elements.  Thus  the  procedure  employed  will  cause  the  proportion  of 
procedure  and  rule/principle  application  represented  on  the  test  to 
reflect  the  proportion  of  these  two  domains  In  the  more  error  prone 
parts  of  performance. 

Standardize  error  and  commonality  scores  and  compute  combined  scores 
for  all  elements.  Rank  order  all  procedure  application  elements  and 
all  rule/principle  application  elements  separately  In  descending  order 
of  combined  error/commonality  score.  Select  elements  with  highest 
combined  scores  In  each  domain  for  representation  as  knowledge  test 
Items  In  accordance  with  the  selection  proportion  determined  In  Step  3. 

Analyze  and  specify  knowledge  requirements  for  each  performance  element 
selected  for  test  Item  construction.  Statements  of  knowledge  required 
for  performance  of  each  selected  element  will  be  prepared  independently 


by  three  SME.  These  statements  will  be  compared  by  a  member  of  the 
research  staff  who  will  attempt  to  resolve  differences. 

Construct  knowledge  test  Items  or  identify  comparable  items  currently 
in  existing  measures  at  Army  schools.  Statements  of  knowledge  require¬ 
ments  generated  In  Step  5  wl 1 1"  be  transl ated  Into  knowledge  test  Items, 
In  accordance  with  standard  prescriptions  for  the  construction  of 
knowledge  test  Items  (e.g.,  Adkins,  1947;  Anastasl,  1976).  As 
Indicated  above,  two  major  types  of  knowledge  are  to  be  represented  In 
the  tests:  knowledge  that  must  be  generalized  In  Its  application  and 
knowledge  that  does  not.  A  crucial  aspect  of  test  Item  construction, 
tnerefore,  will  be  to  Insure,  Insofar  as  possible,  that  performance 
elements  of  tasks  that  Involve  the  generalization  of  knowledge  maintain 
this  requirement  during  translation  Into  test  Items.  In  all  Instances 
where  the  generalization  of  knowledge  Is  called  for,  we  will  endeavor 
to  generate  Items  for  this  category  of  performance  that  require  the 
application  of  rules  or  principles  rather  than  the  recognition,  recall, 
or  restatement  of  rules  or  principles. 

The  total  number  of  test  Items  to  be  developed  will  vary  as  a  function 
of  the  number  of  different  performance  elements  in  an  MOS.  Although 
ultimately  the  length  of  knowledge  tests  will  be  constrained  by  the 
amount  of  time  available  In  training  for  test  administration,  these 
constraints  will  not  apply  during  the  development  phase.  It  is 
expected  that  a  fairly  large  number  of  Items  (300-400)  will  be  devel¬ 
oped  In  order  to  increase  the  likelihood  of  ultimately  capturing  a 
significant  number  of  both  training-relevant  and  job-relevant  Items. 


7.  Item  tryout  and  revision.  Test  Items  will  be  adnlnlstered  to  two 
groups  of  50  trainees  as  they  are  developed  to  assure  clarity, 
comprehensibility,  meaningful  ness,  and  relevance  of  content  to 
performance.  These  characteristics  will  be  examined  In  post-test 
discussions- with  the  test subjects.  Qurlng  the  interval  between  the 
two  tryout  administrations,  changes  Indicated  by  the  results  of  the 
first  administration  will  be  made  In  the  tests. 


Test  Items  will  have  been  constructed  and  considered  content-val Id  on 
the  basis  of  their  representation  of  a  sample  of  common  and  error-prone 
elements  of  task  performance  on  an  MOS.  Items  will  not  have  been 
selected,  for  example,  on  the  basis  of  their  capacity  to  discriminate 
among  students  In  training.  In  Subtask  3.6,  however,  the  comprehensive 
knowledge  tests  will  be  administered  to  persons  prior  to  the  beginning 
of  training,  to  persons  at  the  end  of  training,  and  to  persons  who  are 
performing  as  Job  Incumbents.  Items  that  reveal  little  capacity  to 
detect  knowledge  acquired  either  in  training  or  on  the  Job  will  be 
discarded  subsequent  to  the  field  test  In  which  job  Incumbents  are 
tested. 

After  trial  administration  to  trainees  and  any  necessary  revision,  test 
Items  will  constitute  a  pool  from  which  at  least  two  alternate  test 
forms  can  be  assembled.  The  alternate  test  forms  will  be  developed  by 
random  sampling,  stratified  by  commonality  and  error  values. 


3-42 


5SS 


8.  The  job  knowledge  tests  will  be  given  to  100  trainees  at  the  beginning 
and  end  of  their  AIT  courses  and  to  an  average  of  500  job  Incumbents  in 
each  of  the  MOS  along  with  the  other  criterion  measures  developed  for 
administration  to  the  FY83/84  cohort.  Following  analysis  of wthe  data 
and  possible  revisions  of  the  tests,  arrangements  will  be  made  for  the 
continuous  administration  of  the  tests  to  the  FY86/87  cohort  at  their 
respective  schools.  These  tests  will  provide  a  principal  basis  for 
establishing  predictive  relationships  between  school  performance  and 
measures  of  both  hands-on  job  performance  and  Army-wide  performance. 

Subtask  3,5;  Development  of  Prototype  Measures:  Evaluation  of  Measures 
of  Free  Response,  Synthetic  Testing  of  Hands-on  Performance,  Measures  of 
General  Performance,  and  New  Performance  Indices 

This  subtask  focuses  on  research  designed  to  develop  the  means  for  evalua¬ 
ting  elements  of  task  performance  and  other  aspects  of  training  performance 
not  amenable  to  evaluation  with  conventional  knowledge  tests.  New  methods 
of  testing  and  measurement  are  desirable  In  at  least  the  following  areas: 
evaluation  of  elements  of  task  performance  In  situations  where  a  free 
response  Is  necessary  to  avoid  the  artificial  cueing  of  a  response;  evalua¬ 
tion  of  task  elements  that  Involve  perceptual -motor  skill;  and  evaluation 
of  Indicators  of  general  performance  and  effectiveness  in  training  not 
related  directly  to  the  performance  of  MOS-specIflc  tasks.  In  addition, 
It  is  desirable  to  develop  new  Indices  derived  from  existing  tests  and 
measures  to  provide  Information  not  available  In  the  measures  themselves. 


3 


The  overall  objective  of  this  subtask  Is  to  develop  an  array  of  prototype 
measures  that  might  profitably  be  tried  out  as  Indicators  of  training  suc¬ 
cess,  criteria  for  selection  research,  or  predictors  of  later  job  perform¬ 
ance.  The  subtask  Is  not  meant  to  yield  a  fully  developed  and  validated 
set  of  measures  that  will  then  be  ready  for  Implementation  as  operational 
measures  of  training  achievement  or  final  versions  of  criterion  measures 
for~research.  Consequently,  the  Item  tryouts  and  test  development  work  for 
free  response  and  hands-on  tests  will  be  limited  to  a  relatively  small 
number  of  MOS  (the  four  MOS  that  will  receive  the  Trial  Predictor 
Battery— 05C,  19E/K,  63B  and  71L).  However,  the  measures  of  general 
performance  and  the  new  performance  Indices  will  be  developed  for  all  19 
MOS. 

General  Procedures.  In  general,  the  procedure  will  be  to  select  test 

content  for  each  of  the  prototypes  by  modifying  existing  measures  that  have 

• 

been  developed  or  are  used  In  some  other  context.  Once  an  array  of  poten¬ 
tial  prototypes  has  been  Identified,  the  individual  candidates  will  be 

1  t  \  i 

described  comprehensl vely  by  Project  A  staff.  The  candidate  measures,  will 
be  then  subjected  to  expert  review,  much  as  the  potential  predictors  are 
being  reviewed  In  Task  2,  That  Is,  they  will  be  evaluated  In  terms  of 
their  strengths  and  weaknesses  by  panels  of  research  psychologists. 

Prototype  measure*;  that  survive  the  review  process  will  then  enter  along 
with  the  knowledge  test  Items  the  steps  of  Item  development,  content 
validation  by  SME,  and  pretesting  o.i  tryout  samples. 


1.  Free  response  measures.  The  traditional  use  of  multiple  choice  tests 
with  Identified  response  options  Is  not  adequate  for  all  forms  of  knowledge 
testing.  In  many  Instances  It  Is  desirable  to  evaluate  the  knowledge  that 

l" ‘  \  ' 

mediates  performance  of  a  particular  element  of  a  task  without  any  of 
the  artificialities  Implicit  in  multiple  choice  items.  A  test  of  rule 
application,  for .example,  should  be .constructed  to. provide. opportunities  to 
reveal  awareness  of  the  relevance  of  the  rule  to  the  task  or  situation  at 
hand  as  well  as  mere  knowledge  of  the  rule  Itself.  It  Is  difficult  with 
a  multiple-choice  test  to  define  trouble  shooting,  problem-solving  or 
decision  making  tasks  while  simultaneously  maintaining  the  conditions  of 
uncertainty  that  are  characteristic,  If  not  the  essence,  of  these  types  of 
situations. 

Although  such  considerations  as  these  might  Imply  that  hands-on  testing  Is 
the  only  viable  format,  there  are,  of  course,  a  number  of  methodological 
and  practical  difficulties  to  be  considered  In  utilizing  hands-on  tests  In 
school  settings:  (a)  variation  can  be  expected  In  administration  and 
scoring;  (b)  the  number  of  tasks  that  can  be  evaluated  Is  limited  by  time 
constraints;  and  (c)  hands-on  tests  are  costly  and  time  consuming  to 
construct  as  well  as  administer. 

The  general  approach  taken  in  Task  3  to  the  measurement  of  task  performance 
described  earlier  (Subtask  3.4)  Is  to  decompose  tasks  Into  their  elements 
and  sample  among  these  elements  for  purposes  of  assessment.  With  such  a 
strategy  It  seems  possible  to  allow  a  free  response  format  for  a  limited 
number  of  task  elements  In  any  given  measurement  effort. 


3-45 


We  will  examine  the  feasibility  of  prototype  measures  of  three  types  of 
free  response:  (a)  open-ended  written  Items  for  group  administration, 
(b)  verbal  response  In  a  one-on-one  testing  situation,  and  (c)  perfor¬ 
mance/demonstration  In  a  one-on-one  testing  situation.  These  three  formats 
are  likely  to  vary  both  In  efficiency  of  administration  and  suitability 
according  to  the  property  of  the  task  elements  being  measured  and  the 
characteristics  of  the  population  being  tested^  Thus  open-ended  written 
Items  are  clearly  most  efficient  but  only  may  be  suitable  for  use  with 
persons  with  above  average  verbal  ability  or  In  situations  where  the 
correct  response  Is  commonly  described  In  one  or  two  words.  For  persons 
of  lower  verbal  aptitude,  performance  or  demonstration  of  the  task  element 
may  be  most  appropriate. 

We  will  examine  the  feasibility  of  free  response  measurement  of  task 

elements  in  four  MOS  for  which  hands-on  performance  measures  have  been 

% 

developed  (Task  5)  and  to  which  the  Trial  Predictor  Battery  will  be 
administered  so  that  their  relation  to  these  measures  can  be  determined. 
The  following  general  procedure  will  be  used: 

(1)  Prepare  detailed  research  and  development  plans  for 
development  of  free  response  prototype  measures.  Pre¬ 
pare  troop  support  request.  Submit  plans  and  troop 
support  request  to  ARI. 

(2)  Via  Judgments  of  SME,  designate  elements  of  task  behav¬ 
ior  requiring  application  of  rules  or  principles  or 
which  otherwise  are  suitable  candidates  for  evaluation 
via  free  response  measurement. 

(3)  Construct  a  pool  of  free  test  evaluation  Items  In  three 
formats:  open  ended  written,  verbal,  and  performance/ 
demonstration. 

(4)  Try  out  and  revise  items  on  two  trial  samples  of  50  AIT 
trainees  In  each  MOS  for  which  a  free  response  Item 


*  - 

u 


to 


3-46 


to 


pool  was  constructed.  Evaluate  Items  In  terms  of 
difficulty,  understandlblllty,  and  feasibility  of 
standardization  for  operational  use. 

(5)  Administer  Items  to  sample  of  100  trainees  at  beginning 
and  end  of  AIT  course  faee  Subtask  3,6). 

(6)  Administer  free  response  Items  during  the  admlnl strati  on 
of  the  FY83/84  cohort  first-tour  performance  measures 
(which  Include  the  hands-on  tests  for  tasks  from  which 

- the  free  response  items  were  derived).  — - 

(7)  Analyze  predictability  of  hands-on  task  performance  by 
free  response  task  element  format,  trainee  characteris¬ 
tics,  and  task  element  characteristics.  Report  results. 


2.  Synthetic  testing  of  hands-on  performance.  Hands-on  tests  are  generally 
recognized  as  the  method  of  performance  evaluation  that  provides  the  most 
direct  and  complete  means  of  assessing  task  proficiency.  The  major  need 
for  hands-on  tests  occurs,  however.  In  tasks  that  call  for  a  display  of 
perceptual -motor  skill.  Such  3k111  cannot  be  represented  adequately  In  a 
conventional  paper-and-pencll  test  of  job  knowledge.  Unfortunately  hands- 
on  tests  which  are  appropriate  for  such  measurement  are  expensive  In  cost 
and  time  to  develop  and  administer.  For  obvious  reasons,  psychologists 
have  frequently  sought  measures  represented  In  various  kinds  of  simulations 
as  substitutes  for  a  full  hands-on  test,  l.e.,  display  of  criterion  behav¬ 
ior  In  a  realistic  criterion  setting.  One  such  attempt  has  been  that  of 
Osborn  and  Ford  (1976)  who  coined  the  term  synthetic  testing  to  refer  to 
"...  a  Job  performance  test  that  has  been  degraded  to  some  degree  In  the 
range  of  task  elements  covered  or  In  the  fidelity  of  stimulus/response 
features". 

Osborn  and  his  colleagues  have  explored  a  variety  of  Issues  In  synthetic 
testing,  but  the  basic  strategy  Involves  determining  whether  a  task  can  be 


3-47 


meaningfully  partitioned  Into  subtasks  or  steps,  partitioning  It  Into  those 
parts  and  then,  depending  on  the  characteristics  of  the  task  and  purpose  of 
testing,  following  one  of  these  procedures: 


o  Select  a  test  modality  for  a  part-task  test  of  the 
most  difficult  element. 

—  o  Test  all  task-elements  using  a  method  appropriate  for “~ 
the  most  difficult  one. 

o  Test  the  most  difficult  task  element  with  its  most 
appropriate^  method  and  the  remaining  parts  of  the 
task  with  the  most  efficient  method. 


In  the  present  examination  of  prototype  synthetic  testing  a  variant  of  the 
first  procedure  will  be  used:  hands-on  tests  will  be  designed  for  task 
elements  (two)  that  call  for  a  display  of  perceptual -motor  skill.  The 
predictability  of  task  performance  from  synthetic  test  performance  will  be 
examined  In  the  four  MOS  using  the  following  general  procedure: 


(1)  Prepare  detailed  research  and  development  plans  for 
development  of  synthetic  test  prototype  measures. 
Prepare  troop  support  request.  Submit  plans  and  troop 
support  request  to  API. 

(2)  Identify  elements  of  task  behavior  requiring  display  of 
perceptual  motor  skill.  (See  procedures  Subtask  3.4.) 

(3)  Construct  synthetic  hands-on  prototypes  for  perceptual 
motor  task  elements  using  the  methods  outlined  In  Osborn 
and  Ford  (1976).  The  initial  test  content  will  be 
derived  from  hands-on  Items  developed  In  Task  5  and  from 
existing  (If  any)  hands-on  measures  currently  In  use  In 
AIT  schools.  The  Items  will  be  revised  and 
"degraded"  by  Task  3  staff  In  consultation  with  Task  5 
staff. 

(4)  Try  out  and  revise  prototype  Items  on  the  two  trial 
samples  of  50  AIT  graduates  (along  with  the  job 
knowledge  and  free  response  measures). 

(5)  Administer  measures  to  sample  of  100  trainees  at  begin¬ 
ning  and  end  of  AIT  course  (see  Subtask  3.6). 


(6)  Administer  revised  synthetic  tests  and  Task  5  hands-on 
performance  tests  for  tasks  containing  perceptual -motor 
elements  to  the  FY83/84  cohort. 

(7)  Analyze  predictability  of  hands-on  performance  from 
synthetic  test  performance  by  trainee  characteristics. 

Report  results. 

3.  Measures  of  general  performance  In  training.  The  model  of  soldier 
effectiveness  developed  by  Task  4  Includes  two  types  of  measures:  -objec¬ 
tive  measures,  (attrition,  number  of  AWOL,  awards  given,  etc.),  and  perfor¬ 
mance  ratings.  In  the  development  of  measures  of  general  performance  In 
training,  we  will  perform  the  following  steps.  We  will  develop  a  separate 
checklist  of  questions  for  each  of  the  two  types  of  measures.  These 
checklists  will  provide  Information  as  to  both  the  appropriateness  of  the 
dimension  for  the  training  environment  and  the  feasibility  of  measuring  the 
dimension  in  a  school  setting.  For  example,  typical  questions  asked  about 
each  objective  measure,  such  as  absenteeism,  would  be: 

(1)  Is  this  measure  recorded  accurately? 

(2)  Does  this  measure  tell  you  anything  about  a  trainee's 
performance  In  school? 

(3)  Is  this  measure  currently  collected  by  the  school? 

(4)  What  other  measures  should  be  collected  that  relate  to 
the  performance  dimensions? 

(5)  Is  there  much  difference  In  trainees  on  this  measure? 

In  addition  to  the  above,  typical  questions  asked  about  ratings  would 
Include: 


(1)  Can  you  apply  this  rating  method  In  your  class? 


(2)  Oo  you  have  time  to  do  so? 

(3)  Do  you  have  sufficient  contact  to  rate  trainees? 

(4)  Could  we  change  this  measure  to  fit  your  needs?  How? 

(5)  Do  you  feel  this  particular  dimension  or  scale  Is 
relevant  to  job  performance? 

(6)  Would  you  support  the  development  or  use  of  such  rating 
scales  In  training? 

(7)  Are  there  any  rating  scales  currently  being  used? 

(8)  Are  there  other  ratings  that  should  be  developed  to 
measure  training  performance? 

We  will  administer  each  checklist  to  2  course  instructors,  and  2  Army  unit 
leaders  In  each  of  the  19  MOS.  We  will  administer  the  checklists  to  the 
Artny  unit  leaders  because  It  Is  expected  that  school  personnel  might  not  be 
familiar  with  the  value  of  some  of  the  objective  measures  kept  at  the  unit 
level . 

The  basic  approach  to  development  of  these  prototypes  will  be  to  begin  with 
the  objective  Indices  and  Army-wide  rating  scales  under  development  In 
Task  4  and  evaluate  the  feasibility  of  using  them  In  school  settings. 
Initially,  at  least,  It  seems  quite  reasonable  that  rating  dimensions  such 
as  "overall  performance"  or  "demonstrated  commitment  to  the  Army"  could  be 
modified  to  be  used  by  school  personnel  to  rate  students. 

Once  modified,  the  candidate  measures  would  be  evaluated  by  Project  A  and 
ARI  staff  and  by  a  panel  of  Instructional  staff  to  Identify  major  short¬ 
comings  and  gaps  In  the  array  of  scales  that  could  be  filled  in  by  addi¬ 
tional  development  work. 


ft 

k 

H 

i 

n 

* 

'i 

si 


aid 


3-50 


If  major  gaps  In  the  array  of  dimensions  for  rating  general  factors  In 
training  performance  are  Identified,  e.g.  “mastery  of  factual  knowledge". 
Task  4  procedures  will  be  used  to  construct  new  scales  that  have  been  so 
Identified. 


We  will  pilot  test  the  measures  selected  on  the  same  sample  of  trainees  to 
whom  the  revised  knowledge  tests  will  be  administered,  and  collect 
descriptive  data  about  the  distribution  of  the  scores  of -these- measures. 
Such  measures  will  Include  the  mean,  median,  mode,  standard  deviation, 
coefficient  of  variation.  Interquartile  range,  etc.  The  decision  as  to 
which  of  the  measures  are  the  "best",  l.e.,  have  the  most  potential  for 
predicting  job  performance,  will  be  made  on  the  basis  of  the  standards  set 
up  by  the  experts  In  Subtask  3.2  for  existing  measures.  The  most 
appropriate  means  and  frequency  of  administration  for  these  measures  Is 
dependent  on  the  characteristics  of  the  measures  and  specific  situational 
requl rements. 

The  measures  chosen  will  be  discussed  with  training  managers  at  schools  for 
each  of  the  19  MOS  where  we  will  seek  Implementation.  Data  will  then  be 
collected  on  a  continuing  basis  as  with  other  school  measures  for  the  later 
portion  of  FY83/84  cohort  and  if,  based  on  analysis  of  the  FY83/84 
concurrent  validation  data  they  are  apparently  effective  predictors  or 
criteria,  on  the  FY86/87  cohort. 

4.  New  Performance  Indices.  Although  It  Is  difficult  to  talk  about 
specifics  before  our  compilation  of  Information  from  training  schools  Is 
complete,  new  indices  of  performance  will  be  formulated  whenever  a  comblna- 


3-51 


tlon  of  existing  measures  or  refinements  of  these  measures  offer  promise  of 
providing  useful  Information  not  contained  In  the  original  measures  them¬ 
selves  and  the  refinements  can  be  accomplished  In  an  economic  and  useful 
fashion.  For  example,  combined  measures  of  learning  time  and  achievement 
level  may  generate  an  Index  that  would  be  reasonably  easy  to  determine  and 
which  would  be  significantly  related  to  later  "performance  measures. 

We  will  compute  such  indices  and  test  their  rel.a_t1onsh1p  to  job  performance 
measures  obtained  from  the  FY83/84  cohort  In  the  19  MOS.  Unlike  tests, 
such  as  job  knowledge  tests,  which  can  be  administered  out  of  context  to 
Incumbents,  these  Indices  can  only  be  validated  longitudinally. 
This  Is  because  they  include  actual  training  outcomes  (such  as  time 
spent).  This  disadvantage  Is  offset  by  the  fact  that  they  are  readily 
obtained  from  all  trainees,  and  require  no  additional  test  administration 
time. 

In  the  process  of  obtaining  time  measures,  several  factors  must  be 
accounted  for  or  controlled,  since  the  computation  of  training  time 
routinely  performed  for  administrative  purposes  Is  influenced  by  several 
extraneous  factors. 

In  a  self-paced  course.  If  trainees  determine  when  to  present  themselves 
for  testing,  it  Is  ordinarily  not  possible  to  determine  at  which  point  they 
reached  acceptable  mastery  (Chrlstal,  1976).  One  student  with  a  passing 
score  may  have  learned  enough  to  pass  just  prior  to  the  test;  another, 
following  a  more  conservative  strategy,  may  actually  have  reached  criterion 
well  before  he  or  she  elected  to  be  tested,  and  then  spent  additional  time 
over-studying  the  lesson. 


3-52 


In  many  courses,  adjustments  are  made  for  time  lost  to  sick  call,  non¬ 
training  extra  duty,  etc.  The  method  of  accounting  for  this  time  varies 
considerably.  Closer  control  over  the  accounting  for  these  and  other 
factors  Is  expected  to  be  needed  more  for  the  purpose  of  Project  A  than  Is 
currently  needed  for  administrative  purposes. 

Oraft  and  final  plan  for  prototype  measure  data  collection.  The  Introduc¬ 
tion  of  prototype  measurement  Into  on-going  training  programs,  apart  from 
the  generation  of  new  indices  from  existing  measures.  Is  dependent  on  the 
acceptance  of  such  measurement  by  course  managers  and  Instructors.  Because 
the  new  measures  will  be  developed  with  the  cooperation  and  assistance  of 
course  personnel  and  because  considerable  attention  will  be  devoted  to 
explaining  the  purpose  and  benefits  of  the  measurement.  It  Is  expected  that 
In  general  the  acceptance  of  the  new  measurement  methods  Into  on-going 
training  will  not  present  a  problem.  Where  the  use  of  the  new  measures 
Imposes  additional  personnel  requirements,  e.g.,  test  administrator/scorers 
for  hands-on  tests,  such  demands  may  not  be  as  readily  met. 

The  actual  requirements  and  the  ways  In  which  the  new  measures  will  be 
administered  cannot  be  anticipated  since  they  depend,  of  course,  on  the 
particular  characteristics  of  the  Individual  training  programs.  When 
development  of  the  new  measures  has  begun  at  Individual  schools  the  Task  3 
staff  will  prepare  a  draft  plan  for  data  collection  from  trainees  In  the 
FY83/84  cohort,  and  submit  the  plan  to  the  COR  for  review.  The  plan  will 
discuss  types  of  measurement,  administration  procedures,  frequency  and 
locus  in  training  of  measurement  and  scoring  procedures.  After  review  by 


the  COR  the  draft  plan  will  be  revised  as  necessary  and  a  final  version 
submitted.  A  similar  plan  will  be  submitted  for  data  collection  in  the 
FY86/87  cohort. 

Subtask  3.6:  Identification  of  Training-Relevant  and  Job-Relevant  Knowl¬ 
edge  Test  Items 

To  provide  a  basis  for  Interpreting  predictive  relationships  between 
measures  of  school  and  job  performance,  the  relevance  of  knowledge  test 
items  to  training  and  job  content  will  be  determined.  After  the  knowledge 
tests  have  been  revised  on  the  basis  of  the  tryouts,  they  will  also  be 
administered  to  samples  of  entering  trainees  and  samples  of  job  Incumbents 
In  order  to  assess  the  effects  of  training  and  job  learning.  The  testing 
of  job  Incumbents  will  occur  as  part  of  the  first  and  second  field  tests  to 
be  conducted  during  April -June  1984  and  November  1984-January  1985, 
respectively.  In  these  field  tests  knowledge  tests  will  be  administered  to 
Incumbents  In  the  9  MOS  for  which  hands-on  performance  measures  are  being 
constructed  In  Task  5,  MOS  in  which  training/job  performance  relationships 
can  therefore  later  be  analyzed.1 

When  the  test  data  are  available,  test  Item  difficulty  Indices  will  be 
compared  for  entering  trainees,  graduating  trainees,  and  job  Incumbents. 
Items  that  reveal  a  decreased  difficulty  (higher  percentage  correct)  as  a 
consequence  of  training  will  be  defined  as  "training-relevant";  Items  that 


lHands-on  tests  (Task  5)  will  be  administered  to  Incumbents  at  skill  level 
1  in  4  and  5  MOS  during  the  first  and  second  field  tests,  respectively. 


reveal  little  change  as  a  result  of  training  but  which  reveal  a  decrease 
for  job  Incumbents  will  be  defined  as  “job-relevant";  and  Items  that 
demonstrate  a  decrease  both  at  the  end  of  training  and  among  job  Incumbents 
will  be  defined  as  "training-  and  job-relevant."  It  Is  expected  that  most 
Items  will  fall  In  the  third  category.  -  ... 

In  order  to  Isolate  the  mechanisms  that  mediate  relationships  between 
training  performance  and  job  performance  using  a  cross  sectional  design,  at 
least  two  assumptions  must  be  made: 

(1)  course  content  In  a  particular  MOS  remains  relatively 
constant  over  time,  and 

(2)  the  groups  In  question  are  well  matched  with  respect  to 
any  variables  that  may  affect  job  knowledge  test  scores 
such  as  Intelligence,  race,  sex,  etc. 

To  the  extent  that  these  assumptions  are  not  met,  the  meaning  attributed  to 
the  subset  of  Items  Identified  In  Subtask  3  becomes  suspect.  For  example, 
differences  In  scores  on  the  knowledge  test  may  be  due  to  group  differences 
In  Intelligence.  If  this  were  the  case,  the  Items  might  be  more  appropri¬ 
ately  labeled  "Intelligence  relevant"  than  "job-relevant"  or  "training 
relevant."  Also,  differences  In  scores  on  the  knowledge  test  may  be  due  to 
group  differences  1 ».  training  content.  Thjs  Items  that  were  formerly,  but 
not  presently,  Included  In  training  could  be  erroneously  Interpreted  as  job 
relevant  merely  breusa  job  Incumbents  who  received  an  earlier  version  of 
training  showed  higher  mec'  scores.  Shifts  in  training  content  are  not  In¬ 
frequent  In  military  Instruction  and  may  represent  a  more  pronounced  threat 
to  construct  validity  than  failure  to  match  groups. 


To  avoid  these  problems  It  Is  planned  to  use  a  longitudinal  design  In  which 
a  sample  of  100  trainees  Is  tested  upon  first  entering  AIT  and  then 
retested  at  the  end  of  their  training.  A  counterbalanced  design  will  t>e 
employed  with  two  groups  of  50  trainees  first  taking  alternate  forms  of  the 
knowledge  test  and  later  taking  the  other  form.  Our  estimates  (p.  19) 
Indicate  between  19%  and  52%,  depending  on  MOS,  of  these  soldiers  will  be 
among  the  500  tested  during  the  FY83/84  cohort  performance  data  col  lee- 
tlon.  If  we  assume  30  are  required  for  meaningful  retest  results,  then, 
about  16  MOS  will  have  sufficient  data.  Consequently,  we  will  be  able  to 
compare  performance  across  time  with  more  rigor  than  could  be  accomplished 
with  Independent  groups. 

As  mentioned  earlier,  the  ratings  of  general  performance  In  training,  the 
new  performance  Indlcles,  and,  In  the  case  of  4  MOS  (05E,  19E/K,  63B  and 
71L),  the  free  response  and  synthetic  measures,  will  also  be  obtained  for 
the  sample  of  100  trainees.  The  Interrelationships  among  the  new  as  well 
as  existing  school  measures  will  be  determined  as  well  as  their  separate 
relationships  with  existing  end  new  predictors.  Of  particular  Interest 
will  be  any  shifts  in  relationships  between  predictors  and  school  measures 
that  occur  over  time  e.g.,  a  drop  In  predictability  of  knowledge  test 
scores  from  AIT  entry  to  graduation  to  job  Incumbency. 

The  data  from  the  longitudinal  research  will  be  analyzed  to  determine  the 
amount  of  variance  In  job  performance  explained  by  the  various  subsets  of 
Items.  First,  knowledge  Items  will  be  partitioned  Into  one  of  four 
groups.  Group  1  will  consist  of  items  that  do  not  require  generalization 
and  are  learned  in  training.  Group  2  will  consist  of  items  that  do  require 
generalization  and  are  learned  In  training.  Groups  3  and  4  will  consist 


b 

f; 

I 

5 


of  Items  that  are  learned  on  the  job  (are  less  difficult  for  incumbents) 
and  are  procedure-related  or  rule-related,  respectively. 

Several  multiple  regression  equations  will  be  developed  and  tested  within 
each  of  the  19  MOS.  The  equations  will  consist  of  a  job  performance 
measure  (as  the  criterion),  one  or  more  measures  of  cognitive  ability  and 
other  pre-induction  predictors,  and  the  four  groups  of  test  Items  defined 
above,  as  follows:  Job  Performance  *  Pre-Induction  Measures  ♦  Group  1  ♦ 
Group  2  +  Group  3  +  Group  4. 

This  model  will  be  tested  against  a  model  that  includes  only  the  pre>» 
Induction  measures:  Job  Performance  ■  Pre-Induction  Measures. 

If  there  are  no  significant  differences  between  these  two  models,  as  tested 
by  an  "Fli  procedure  outlined  by  Cohen  and  Cohen  (1975),  It  would  appear 
that  the  knowledge  Items  are  not  capturing  any  unique  or  additional  portion 
of  the  criterion  variance.  If,  however,  one  or  more  type  of  item  consis¬ 
tently  enters  Into  the  equations  with  a  significant  F  value,  we  will  be 
able  to  tailor  further  knowledge  test  development  In  the  direction  of 
Increasing  those  types  of  Items. 


In  addition  to  the  empirical  methods  of  identifying  training-relevant  and 
job-relevant  test  Items  described  as  above,  judgments  of  trainers  will  also 
be  used.  In  each  MOS,  three  SME  will  Independently  sort  behavioral 
elements  from  task  analyses  into  those  that  should  be  mastered  by  graduates 
of  their  course,  considering  the  course  objectives  and  content,  those  that 
should  be  mastered  on  tr.e  job,  and  those  that  are  not  job  relevant.  The 


3-57 


V 

.  • 

■  »f • 


;  .  ,  V  • 

curresyoaOnce  between  the  trs'st  Item  corftent  and  these  behavioral  elements 
will  alse  be  j u*ged  toy  the  SHE.  This  procedure  will  provide  a  means  of 

verifying  .and  interyretlagthe  resu-lts  of  the  quantitative  analysis* 

(  ’  *  .■  ■  , 

Seb|— fc  2iZi  School  Criterion  Composites  aid  Predictor  Composites 

Candidate  tretnlns  measures  to' be  usied  in  validating  Initial  predictors  and 
In  predicting  M0S,-spec1f  1c  and  Army*wide  performance  will  be  Identified  or 
developed  beginning  In  Subtask  3.2  with  the  review  of  existing  measures 
and  continuing',  through  the  development  and  tryout  of  prototype  measures 
In  Subtask  3.S  and  In  the  further  refinement  of  the  Job  knowledge  tests 
In  3.6. 

t  •  •  • 

'  :  1  *  -  !  '  .  . 

\  \  .  *  :  •  •  .  •  •  '  ; 

/  '  •  ‘  ,  .  • 

The  relationship  of  these  measures  to  existing  and  newly  developed  predlc- 

\ 

tors  will  be  examined  through  admlnl strati  Ohs  of  predictors  to  FY83/84 
cohort  personnel  from  October  1983  through  Jung  1984.  ,  The  relation  of 
training  measures  to  subsequent  criteria  will  be  examined  through 
administrations  of  Ariqy-wlde  and  MOS-specIflc  measures  to  FY83/84  cohort 
personnel,  from  June  through  September  1985. 

I 

On  the  basis  of  these  data  collections,  training  measures  will  be  assembled 
into  Integrated  sets  of  criteria  and  predictors.  The  construction  of  pre¬ 
dictor  sets  will  be  essentially  empirical.  Candidate  predictor  measures 
will  be  factor  analyzed,  and  regression  techniques  will  be  used  to  identify 
groups  of  measures  that  contribute  the  most  to  the  predlctabi 1 ity  of  the 
various  criteria  that  will  then  be  available. 


r-. 


& 


£ 

be 


V; 

v*. 


Building  criterion  sets,  however,  will  also  require  judgments  about  the 
relative  contribution  of  different  measures  to  the  construct  "school  suc¬ 
cess".  Since  school  success  cannot  validly  be  defined  simply  as  whatever 
components  of  training  are  predictable,  the  judgments  of  school  personnel 
will -be -obtained  to  Inform  this  selection  and  weighting  process.  For 
example.  In  self-paced  courses  two  potential  measures  are:  (a)  total  time 
to  complete  the  course,  and  (b)  mean  score  of  first  attempt  to  pass  module 
tests.  Which  of  these  Is  more  Important,  to  spend  the  least  amount  of  time 
in  training  or  to  obtain  the  highest  possible  score? 

If  these  two  measures  are  found  to  be  highly  correlated  (which  Is  not 
unlikely,  since  students  of  higher  ability  may  score  high  on  both  counts) 
there  Is  no  serious  conflict  to  be  resolved  in  weighting  them  In  a  criter¬ 
ion  composite,  since  alternate  sets  of  positive  weights  would  produce 
composites  that  are  highly  correlated.  But  If  these  or  other  measures  are 
less  highly  correlated,  a  value  judgment  must  be  sought.  We  will  convene 
panels  of  officers  for  each  one  of  the  19  MOS  In  Project  A  for  this 
purpose.  Panel  participants  will  be  presented  with  descriptions  and 
explanations  of  the  candidate  criteria  which  the  panel  will  be  asked  to 
weight  according  to  their  contribution  to  the  definition  of  a  "successful 
trainee." 

These  criterion  composites  will  then  be  used  In  the  longitudinal  validation 
of  initial  predictors  using  the  FY86/87  cohort.  Monitoring  of  the  adminis¬ 
tration  of  these  measures  will  take  place  from  March  1986  through  February 
1987. 


3-59 


Subtask  3.8:  Analyze  Predictive  Relationships  and  Prepare  Reports 


As  data  become  available  from  the  administration  of  performance  measures  to 
the  FY83/84  cohort  (June-September,  1985),  correlation  matrices  will  be 
used  to  Identify  groups  of  measures  that  are  highly  related  to  MQS-specIflc 
and  Army-wide  criteria.  This  will  Involve  existing  school  measures 
reviewed  In  Subtask  3.2,  prototype  measures  developed  In  Subtask  3.5,  and 
job-relevant  and  training-relevant  knowledge  test  items  Identified  In 
Subtask  3.6.  Cluster  analysis  will  first  be  used  to  determine  which  types 
of  training  measures  are  most  related  In  general  to  the  various  types  of 
criteria.  All  analyses  will  be  coordinated  with  Task  1. 

The  predictive  relationships  between  the  new  (Task  2)  predictors  admin¬ 
istered  to  the  FY83/84  cohort  during  the  same  period  will  likewise  be 
determined,  using  the  same  methods  as  used  In  Subtask  3.2  to  determine 
relationships  between  the  current  ASVA8  and  current  school  measures.  These 
analyses  will  be  used  to  help  select  the  Integrated  sets  of  training  meas¬ 
ures  to  be  used  In  the  main  longitudinal  validation  with  the  FY86/87 
cohort,  beginning  March,  1986. 

The  relationship  of  new  predictors  administered  to  the  FY86/87  cohort  to 
Task  3  training  measures  will  be  analyzed  as  data  cohort  members  pass 
through  training  and  data  on  both  measures  become  available  (approximately 
July  1986  through  April,  1987). 

The  results  of  these  analyses  will  be  presented  In  a  draft  technical  report 
and  draft  Instruments  report  which  will  be  revised  on  the  basis  of  comments 


from  the  COR.  The  final  technical  report  will  be  prepared  in  two  parts. 
The  first  part  will  be  an  executive  summary  of  the  types  of  information 
useful  to  Army  personnel  managers;  the  second  part  will  be  a  detailed 
narrative  in  a  format  suitable  for  ARI  publication  and  submission  to 
psychological  journals. _  _  ..........  - - - 


L 

SUMMARY  OF  EXPECTED  OUTCOMES  FROM  TASK  3 


1.  Integrated  sets  of  training  measures  for  a  sample  of  MOS,  to  serve  as  r.; 

iV 

criteria  to  va11date  _1n1_t1al_pred1ctors_and  as_pred1ctors- of  subsequent - 

Amjy  performance. 


2.  Improved  comprehensive  job  knowledge  tests. 

3.  Prototype  training  measures  and  performance  Indices  to  assess  compon- 
ents  of  training  success  not  represented  In  existing  training  measures. 

4.  A  procedure  for  sampling  the  job  content  domain  to  select  test  content 
that  Is  more  objective  and  reliable  than  sampling  strategies  currently 
being  used  operationally. 

5.  Delineation  of  the  current  Army  training  development  and  evaluation 
system. 


6.  Identification  of  measures  now  being  administered  or  recommended  for 
use  In  training  that  have  the  requisite  characterl sties  to  be  used  as 
surrogate  criteria  of  job  performance  In  on-going  development  of  Army 
selection  and  classification  procedures  and  instruments.  In  addition 
to  development  of  new  measures,  Task  3  will  yield  an  evaluation  of 
existing  measures  as  predictors  of  job  performance  and  as  candidate 
validation  criteria.  To  the  extent  that  general  classes  of  existing 


r  \ 

\y 

*  « 


L 


r. 

r/ 


EJ 


3-62 


(S53  yi?*  sen  tm  IT:  rr-r=  ry.i  ?%■ 


measures  are  found  t,o  be  stable  across  MOS  in  their  predictive  valid¬ 
ity,  the  evaluation  and  classification  of  existing  measures  will  permit 
generalizing  'findings  beyond  the  19  MOS  in  Project  A. 

Scientific  Outcome 

1.  Determination  of  the  predictive  mechanlsm(s)  that  explain  the  relatlon- 

_ ships  between  training  performance __and  job  performance  using  two 

different  methods.  Predicting  job  performance  using  (a)  training- 
relevant  only  Items,  and  (b)  Job-and  training-relevant  items  will  be 
used  as  an  indirect  means  of  determining  the  mechanisms  that  mediate 
the  relationship  between  training  performance  and  job  performance. 
Inspection  of  the  unique  and  common  variance  In  job  performance  asso¬ 
ciated  with  each  subset  of  items  will  Indicate  whether  learning  ability 
Itself  (and  by  Inference,  learning  on  the  job)  or  the  commonality  of 
elements  between  school  performance  measures  and  job  performance 
measures  Is  the  key  factor  In  predicting  job  performance  from  training 
performance. 

2.  Determination  of  the  adequacy  of  training  performance  for  validating 
selection  procedures.  A  fundamental  purpose  of  Project  A  Is  to  deter¬ 
mine  whether  training  performance  can  serve  as  an  adequate  surrogate 
for  job  performance  In  establishing  the  validity  of  Initial  predictors 
such  as  ASVAB.  Since  undergoing  training  consists  of  different  activi¬ 
ties  from  performing  a  job,  It  cannot  be  assumed  that  predictors  of 
success  In  training  will  also  be  good  predictors  of  job  performance. 
However,  using  school  performance  as  the  validation  criterion  achieves 


considerable  savings  In  resources  compared  to  validating  against  job 
performance.  The  development  of  training-relevant  and  job-relevant 
training  measures  In  Task  3,  therefore,  will  enable  the  predictive 
relationship  between  school  performance  and  job  performance  to  be 
determined.  From  this  will  come  a  determination  of  the  adequacy  of 
school  performance  as  an  economical  validation  criterion. 

3.  Determination  of  which  component (s)  of  training  performance  should 
serve  as  the  focus  of  predictor  development.  In  addition  to  establish¬ 
ing  overall  relationships  between  school  performance  and  job  perfor¬ 
mance,  Task  3  will  yield  Information  about  various  components  of  school 
performance--knowledge  acquisition,  hands-on  application,  speed  of 
learning,  etc. --as  they  relate  to  job  performance.  If  speed  of  learn¬ 
ing  (a  possible  motivation  measure)  Is  found,  In  general,  to  be  a 
better  predictor  of  job  performance  than  knowledge  acquisition,  for 
example,  this  Information  would  guide  the  development  of  Initial 
predictors  toward  that  construct. 

4.  Determination  of  whether  training  performance  predicts  differentially 
for  different  groups  of  trainees  (race,  gender,  mental  aptitude)  and 
different  groups  of  MOS  (combat,  administrative,  etc.,  or  other 
groups).  The  relationships  between  training  performance  and  job 
performance  will  be  analyzed  In  terms  of  personal  variables  and  job 
variables  that  can  be  expected  to  moderate  the  relationships.  For 
example,  school  performance  may  be  found  to  be  a  better  predictor  of 
later  performance  in  jobs  that  are  relatively  procedural ,  such  as 
an  administrative  specialist,  than  In  jobs  that  are  less  structured  or 


3-64 


Involve  more  Interpersonal  contact  and  decision  making,  such  as  mili¬ 
tary  police.  Similarly,  If  training  performance  Is  found  to  be 
predictive  for  certain  subgroups  and  not  others,  then  predictors  could 
be  selected  on  the  basis  of  their  subgroup  correlations,  with  resulting 
Improvements  In  accuracy. 


REFERENCES 


Adkins,  D.C.  Construction  and  analysis  of  achievement  tests. 

Washington,  DC:  United  States  Civil  Service  Commission,  1947. 

Anastasl,  A.  Psychological  Testing  {4th  ed.)  New  York,  NY:  MacMillan  Pub¬ 
lishing  Co.,  Inc.,  1976. 


Burtch,  L.O.,  Lipscomb,  M.S.,  &  Wlssman,  D.J.,  Aptitude  requirements  based 
on  task  difficulty:  Methodology  for  evaluation.  (AffH6L-tR-8i-^4) 
Brooks  Air  Force  Base,  Texas:  Manpower  and  Personnel  Division,  January 
1982. 

Chrlstal,  R.E.  What  Is  the  value  of  aptitude  tests?  Proceedings  of  the 
18th  Annual  Conference  of  the  Military  Testing  Association.  October 
T57^ - - * - 

Cook,  T.D.,  &  Campbell,  O.T.  The  design  and  conduct  of  quasi -experiments 
and  true  experiments  In  field  settings.  In  M.D.  Dunnette  (Ed),  Hand¬ 
book  of  Industrial  and  Organizational  Psychology.  Chicago:  Rand 
McNally,  1976. 

Ellis,  J.A.,  Wulfeck,  W.H.,  II,  &  Fredericks,  P.S.  The  instructional 
quality  Inventory  -  II.  User‘s  Manual  (NPRDC  Special  Report  79-24). 
San  Diego,  CA:  Navy  Personnel  Research  and  Development  Center,  August 
1979. 

Guidelines  for  development  of  skill  qualification  tests.  Ft.  Eustls,  VA: 
individual  Training  ana  Evaluation  Directorate,  U.S.  Army  Training 
Support  Center,  December  1977. 

Interservice  procedures  for  Instructional  systems  development.  TRADOC 
Pamphlet  350-30,  1975. -  - - - 

lumsdalne,  A. A.  Design  of  training  aids  and  devices.  In  J.D.  Folley,  Jr. 
(Eds),  Human  Factors  Methods  for  System  Design.  Pittsburgh:  The 
American  Institute  for  Research  for  Office  of  Naval  Research,  1960. 
pp.  217-290.  AIR-C90-60-FR-225. 

McCain,  L.T.,  &  McCleary,  R.  The  statistical  analysis  of  the  simple 
Interrupted  time-series  quasi-experiment.  In  T.D.  Cook  and  D.T. 
Campbell  (Eds),  Quasi -Experimentation:  Design  and  Analysis  Issues  for 
Field  Settings.  Chicago:  Rand  McNally,  1979. 

Mead,  O.F.  Determining  training  priorities  for  job  tasks.  Proceedings 
of  the  17th  Annual  Conference,  Military  Testing  Association. 1975. 
55TOB'. - 

Osborn,  W.C.  &  Ford,  P.J.,  Research  on  methods  of  synthetic  performance 
testing.  U.S.  Army  Research  Institute  Tor  the  3ehavi oral  and  Social 
Sciences,  April  1976. 


3-66 


Pickering,  E.J.,  &  Anderson,  A.V.  Measurement  of  job-performance  capabili¬ 
ties  (NPRDC  TR  77-6).  San  Diego,  CA:  Navy  Personnel  Research  and 
Development  Center,  December  1976. 

Smode,  A.F.,  Gruber,  A.,  &  Ely,  J.H.  The  measurement  of  advanced 
flight  vehicle  crew  proficiency  in  synthetic  ground  environments 
(mHl-TDR-62-1)  .  Wright -Patterson  AFb,  5h!  Behavioral  Sciences 

Laboratory,  February  1962. 

Vlneberg,  R.,  &  Joyner,  J.N.  Prediction  of  job  performance:  Review  of 
military  studies  (NPRDC  TR  82-37).  San  Diego,  CA:  Navy  Personnel 
Research  and  Development  Center,  March  1982. 

Vlneberg,  R.,  &  Taylor,  E.N.  Performance  In  four  Army  jobs  by  men  at 
different  aptitude  (AFQT)  levels:  3.  The  relat(onsnfp  of  AFflf  and 
job  experience  to  job  performance  (HumRRO  Tech.  Rep.  72-22). 

Alexandria,  VA:  Human  Resources  Research  Organization,  August  1972. 

Wernlmont,  P.F.,  &  Campbell,  J.P.  Signs,  samples  and  criteria.  Journal 
of  Applied  Psychology,  1968,  52,  372-376. 

Wheaton,  G.R.,  &  Flngerman,  P.W.  Development  of  a  model  tank  gunnery  test 
(ARI  TR-78-A24).  Alexandria,  VA:  U.S.  Army  Research  Institute^  for 
the  Behavioral  and  Social  Sciences,  August  1978. 


r 


3-67 


TASK  4 


v'  -O  V  *.-  AiVV^-^N  VW.,V^V%* W3 


MEASUREMENT  OF  ARMY-WIDE  PERFORMANCE 


GENERAL  PURPOSE  OF  TASK  4 


This  task  Is  devoted  to  the  Identification,  refinement,  and  development  of 
In-service  predictors  and  Army-wide  performance  measures.  In-service 
predictors  are  measures  obtained  after  a  soldier  enters  the  Army,  which 
predict  the  soldier's  later  performance  or  effectiveness  in  his/her 
military  career.  Army-wide  performance  measures  are  those  indicators  of 
general  performance  and  effectiveness  not  related  directly  to  the  perfor¬ 
mance  of  MOS-specIflc  tasks.  This  effectiveness  domain  may  also  contain 
measures  of  a  soldier's  overall  value  or  worth  to  the  Army. 


The  central  goals  of  this  task  are:  (a)  to  Identify  aspects  of  soldier 
effectiveness  that  apply  to  all  MOS;  (b)  to  Identify  and/or  develop  valid 
Indicators  to  measure  these  aspects  of  effectiveness;  and  (c)  to  establish 
the  Indicators  as  criteria  of  soldier  effectiveness  and,  where  appropriate, 
as  In-service  predictors  of  future  performance  or  other  aspects  of  soldier 
effectiveness.  Measures  must  be  Identified  and  refined  or  developed  for 
both  first-tour  and  second-tour  performance.  In  addition,  research  must 
determine  the  utility  to  the  ^my  of  performance  levels  established  by 
these  measures. 


Definition  of  Army-wide  effectiveness  will  require  careful  specification  of 
the  relevant  criterion  space.  "Outcome  indicators"  and  objective  admini¬ 
strative  Indices  such  as  attrition,  disciplinary  actions,  special  awards. 


4-1 


schools  attended,  etc,,  are  clearly  Army-wide  criteria,  and  measures  of 
these  types  of  criteria  will  be  of  concern  In  the  research.  A  second 
focal  point  will  be  development  of  general  performance  and  soldier  effec¬ 
tiveness/worth  to  the  Army  measures.  The  "worth  to  the  Army"  construct  is 
conceptualized  as  Including  a  relatively  broad  set  of  soldier  effectiveness 
criteria  such  as  organizational  commitment,  organizational  socialization, 
and  morale.  Ideally,  It  Is  Intended  to  Index  a  soldier's  overall  value  to 
his/her  unit  and  the  Army. 

Special  behavior -based  rating  scales  will  be  prepared  to  measure  soldier 
effectiveness  on  all  important  dimensions  Identified  In  model  development 
work,  and  supervisory,  peer,  and  self  ratings  will  be  gathered  to  provide  a 
second  set  of  Army-wide  effectiveness  criteria.  As  mentioned,  some  of 
these  criteria  may  serve  as  In-service  predictors  as  well. 


BACKGROUND  ISSUES  AND  RATIONALE 


Issues  In  Criterion  Development 

Obtaining  accurate  measures  of  Individuals'  job  performance  is  absolutely 
critical  In  personnelsel ectl on  research  (e.g.,  Dunnette,  1966;  Gulon, 
1965),  Too  often,  considerable  time  Is  spent  In  developing  predictor  tests 
and  measures  at  the  expense  of:  (a)  Identifying  performance  constructs 
that  should  be  targets  of  predictor  measures,  and  (b)  actually  measuring  In 
some  valid  manner  the  effectiveness  of  Individual  performers  on  those 
constructs  Identified.  Yet,  clearly,  test  validation  results  can  be 
meaningful  only  If  proper  attention  Is  paid  to  the  "criterion  side,"  so 
that  an  accurate  depiction  of  job  performance  effectiveness  Is  provided. 
Careful  criterion  development  work  should  drive  Identiflcatlon/development 
of  predictors  In  selection,  and  then  also  provide  measures  of  performance 
for  predictor  validation  efforts. 

Two  types  of  performance  measures  should  be  dlt cussed:  objective  Indices, 
e.g.,  for  an  Army  clerical  MOS  -  number  of  pages  typed  per  eight-hour  day 
and  number  of  typing  errors  made  per  page,  and  performance  ratings. 
Objective  Indices  of  a  worker's  performance  are  In  certain  cases  preferable 
to  the  subjective  Impressions  provided  by  performance  ratings,  but  good 
objective  measures  are  hard  to  acquire  (Gulon,  1965;  landy  &  Trumbo,  1980). 

The  difficulty  with  the  vast  majority  of  objective  performance  measures 
Is  that  they  are  almost  Invariably  deficient  and/or  contaminated  (Gulon, 
1965;  Smith,  1976).  By  deficient,  we  mean  the  measure  provides  only  a 


4-3 


rz'.tn" 


partial  picture  of  the  worker's  effectiveness  on  the  job;  that  Is,  there 
are  Important  aspects  of  the  job  left  untapped  by  the  objective  measure. 
Referring  to  the  clerical  MOS  example  above,  typing  speed  and  accuracy  may 
well  be  Important  indices  of  soldier  effectiveness  In  this  MOS,  but  If 
helping  break  In  Inexperienced  typists  and  willingness  to  work  very  hard 
during  heavy  production  periods  are  also  Important  for  job  success,  then 
the  former  two  measures.  Individually  or  together,  do  not  adequately 
measure  effectiveness  on  the  job.  They  are  deficient. 

Contamination  In  objective  measures  occurs  when  factors  that  affect  how 
well  persons  do  with  respect  to  the  measure  are  beyond  their  control. 
Referring  again  to  the  example  above,  suppose  that  number  of  pages  typed  In 
a  day  depends  to  some  extent  on  the  kind  of  text  the  typist  Is  to  work  on, 
and  the  soldier  has  no  control  over  those  assignments.  The  "number  of 
pages"  measure,  therefore,  provides  an  Impure  Index  of  effectiveness;  it  Is 
contaminated.  Unfortunately,  these  are  very  common  problems  with  objective 
performance  measures.  Identifying  or  developing  good,  comprehensive  objec¬ 
tive  Indices  Is  very  difficult. 

Our  experience  with  objective  Indices  of  soldier  effectiveness  In  the  Army, 
e.g.,  AWOl,  awards,  etc..  Is  that  Individual  measures,  on  their  own,  are 
probably  deficient  as  Indicators  of  effectiveness  (Borman,  Johnson, 
Motowldlo,  &  Ounnette,  1975;  Shields,  Hanser,  Williams,  &  Popelkai  1981). 
However,  composites  of  these  measures  formed  within  a  carefully  Jeflned 
conceptual  framework  may  well  provide  reasonable  measures  of  effectiveness 
on  Important  Army-wide  criteria. 


e- 


W- 


l\. 

r\ 


4-4 


Of  course,  ratings  of  effectiveness  have  their  own  set  of  problems. 
Briefly,  factors  that  lead  to  Inaccuracies  In  ratings  include  the  follow¬ 
ing:  (a)  ratings  are  often  obtained  from  persons  In  a  poor  position  to  make 
judgments  about  Incumbent  performance  (Borman,  1974;  Campbell,  Dunnette, 
Lawler,  &  Welck,  1970);  (b)  some  raters  simply  lack  the  observational  and/ 
or  judgment  skills  necessary  to  make  accurate  evaluations  (Borman,  1979); 
(c)  raters  often  provide  biased  ratings,  based  not  so  much  on  performance 
as  on  race,  sex,  background,  similarity  to  the  ratee  In  attitudes,  etc. 
(Hamner,  Kim,  Baird,  t>  Blgoness,  1974;  Terborg  &  Ilgen,  1975);  (d)  raters 
sometimes  commit  rating  errors,  such  as  evaluating  everyone  as  very  effec¬ 
tive,  when.  In  fact,  some  ratees  are  performing  poorly  (Taylor  t  Wherry, 
1951);  and  (e)  raters  may  fall  to  use  the  definitions  of  the  performance 
dimensions,  employing  Instead  their  own  Idiosyncratic  beliefs  about  what  It 
takes  to  perform  effectively,  and  then  rating  persons  accordingly  (Borman 
and  Peterson,  In  press). 

These  factors  admittedly  reflect  serious  difficulties  with  ratings,  but  one 
feature  of  rating  scale  development  that  should  be  mentioned  Is  an  often 
overlooked  distinct  advantage  of  the  method.  This  feature  relates  to  the 
ability  of  a  set  of  well-defined  rating  dimensions  to  capture  In  a  compre¬ 
hensive  manner  all  Important  performance  requirements  of  a  job.  That  Is, 
If  the  requirements  for  successful  job  performance  can  be  articulated  at 
all,  they  can  be  represented  In  a  set  of  performance  dimensions.  This 
means  that  rating  scales,  if  properly  developed,  have  tremendous  potential 
for  generating  performance  scores  that  reflect  a  ratee's  actual  effective¬ 
ness  on  each  important  dimension  of  job  performance.  It  also  means  that 
the  conceptual  definition  of  a  job's  performance  requirements  can  be  very 


I 


F 


'«* 

$ 

3 


■3- 

$ 


•  ' 
O 
Li 

t 


well  delineated,  given  the  proper  approach  to  rating  scale  development.  In 
fact,  the  conceptual  definition  of  job  performance,  resulting  In  a  set  of 
carefully  defined  performance  dimensions,  can  provide  a  framework  for 
criterion  measurement  In  general  (both  ratings  and  other  criterion  Indices, 
such  as  objective  measures). 

In  other  words,  the  researcher  can  first  gain  a  definition  of  the  job's 
performance  requirements  through  rating  scale  development,  and  then  use 
both  ratings  and  objective  measures  to  index  performance  on  these  important 
performance  requirements.  Of  course,  in  this  discussion,  we  are  focusing 
on  using  rating  dimensions  as  definitions  of  job  performance  requl rements. 
A  separate  question  pertains  to  the  ratings  themselves  on  these  performance 
dimensions,  and  as  was  mentioned  previously,  formidable  problems  are  evi¬ 
dent  with  ratings. 

On  balance,  we  believe  that  ratings  of  soldier  performance  and  effective¬ 
ness  can  be  useful  In  this  project.  If  a  few  simple  but  very  Important 
principles  are  followed  In  gathering  performance  ratings,  the  accuracy  of 
these  ratings  can  be  maximized.  These  principles  Include:  (a)  developing 
the  rating  scales  with  great  care  taken  to  reflect  all  important  perfor¬ 
mance  requirements  of  the  job;  (b)  creating  dimensions  that  are  clearly 
performance-related  and  that  represent  performance  factors  raters  can 
readily  observe  In  ratees;  (c)  providing  clear,  simple  directions  for  using 
the  rating  scales;  (d)  gathering  the  ratings  for  research  purposes  only 
(rather  than  for  any  administrative  purpose),  and  making  this  clear  to  the 
raters;  (e)  selecting  raters  who  have  good  opportunity  to  observe  ratee 
performance,  which  may  mean  selecting  peers  as  well  as  supervisors  of  the 


aU 

i- 


£ 


i 


B 

V: 


c_ 


4-6 


persons  to  be  rated;  and  (f)  where  possible,  collecting  ratings  of  each 
ratee  from  more  than  one  rater  so  that  Interrater  agreement  may  be  assessed 
to  provide  at  least  a  rough  estimate  of  the  accuracy  of  the  ratings.  Our 
experience  suggests  that  when  attention  Is  given  to  such  principles, 
reasonably  high  quality  ratings  are  likely  to  emerge  (Borman  &  Peterson,  In 
press).  ------  -  -  ------ 

As  mentioned,  objective  measures  may  also  prove  to  be  useful  Indices  of 
soldier  effectiveness.  Relying  on  objective  Indices  to  measure  al]_  of 
the  performance  criterion  domain  Is  unrealistic,  we  believe.  However, 
targeting  selected  Indices  and,  especially  composites  of  these  Indices  to 
measure  aspects  of  performance  and  effectiveness  is  likely  to  be  more 
fruitful.  An  example  might  be  In  tapping  soldier  effectiveness  related 
to  Am\y  discipline.  A  composite  of  AWOL,  Article  15,  and  other  discipline 
oriented  Indicators  might  serve  as  a  reasonable  Index  of  effectiveness 
In  this  arena. 

In  addition,  we  believe  that  certain  objective,  "outcome"  indicators  might 
be  best  used  by  considering  categories  of  these  variables.  For  example, 
attrition  Is  a  very  broad  outcome  variable  In  the  sense  that  there  are 
many  different  reasons  for  leaving  the  service.  Further,  attrition  for 
different  reasons,  e.g.,  for  medical  reasons  versus  for  disciplinary 
reasons,  has  very  different  Implications  for  the  kinds  of  skills,  abili¬ 
ties,  personal  characteristics,  etc.  that  might  be  relevant  as  predictors 
of  these  outcomes.  Accordingly,  working  with  categories  of  such  broad, 
complex  outcome  variables  (reenl Istment  is  another  example)  should  lead  to 
more  conceptually  appropriate  predictor-criterion  links  and  also  to  higher 


4-7 


validity  coefficients  for  the  predictors.  In  short,  using  this  category 
approach  to  measuring  certain  outcome  variables  seems  to  have  more  merit 
for  selection  research  than  treating  each  such  variable  as  a  single  crite¬ 
rion.  Thus,  developlng/ldentlfylng  composites  targeted  toward  specific 
aspects  of  the  criterion  performance  domain  and  working  with  outcome 
variable  categories  should  make  maximally  useful  tne  employment  of  objec¬ 
tive  Indices  of  performance  In  this  research. 

Rationale  for  Task  4  Research 

In  our  approach,  we  propose  to  address  the  Issues  and  problems  of  criterion 
development  by  building  an  Inductive  model  of  soldier  effectiveness,  which 
Includes  elements  of  general  effectiveness  as  a  soldier  and  what  we  have 
called  worth  to  the  Army.  By  "Inductive,”  we  mean  that  we  have  no  specific 
hypotheses  about  the  dimensions  of  soldier  effectiveness  that  may  emerge 
from  Task  4  research;  we  have  only  preliminary  Ideas  about  some  of  the 
domains  that  might  be  Included. 

Preliminary  hypotheses  about  the  model,  elaborated  In  the  Task  4  proposal, 
suggest  that  elements  represented  might  Include  organizational  commitment, 
organizational  socialization,  and  morale.  Also  to  be  included  in  the  model 
are  those  aspects  of  soldier  job  performance  that  cut  across  MOS  and  are 
therefore  Important  for  soldier  effectiveness  on  the  job  no  matter  what  the 
specific  MOS. 

The  model  is  likely  to  contain  multiple  dimensions  with  various  personal 
characteristics/attributes  responsible  for  performance  on  them.  For 
example,  job  knowledge  might  be  important  for  performance  or  effectiveness 


on  some  dimensions;  skills  or  abilities  may  be  Important  for  other  dlmen- 
slons;  and  motivation  might  be  Important  for  still  other  dimensions  In  the 
model . 

The  general  Idea  Is  for  the  model's  dimensions  of  soldier  effectiveness, 
derived  primarily  from  behavioral  analysis  workshops  described  in  the 
upcoming  PROCEDURE  section,  to  provide  a  framework  for  development  of  the 
actual  measures  of  effectiveness.  The  dimensions  can  serve  directly  as 
rating  scales  for  superiors  and  peers  to  evaluate  soldiers  In  the 
research.  But  also,  objective  measures  might  be  Identified  or  even 
developed  to  tap  effectiveness  on  some  of  the  dimensions. 

This  multimethod  approach  to  measuring  performance  and  effectiveness  will 
be  part  of  a  careful  construct  validation  strategy  In  criterion  development 
work  (James,  1973;  Smith,  1976).  We  Intend  to  use  the  most  conceptually 
appropriate  source  of  performance/effectiveness  Information  to  Index  each 
element  or  dimension  In  the  model,  but  In  addition,  more  than  one  method, 
l.e.,  peer  ratings,  supervisor  ratings,  self-ratings,  administrative 
Indices,  etc.,  will  be  employed  whenever  possible  In  this  measurement 
effort.  Multltralt-multlmethod  analyses  (Kavanaugh,  MacKInney,  4  Wollns, 
1971;  Lawler,  1967)  can  then  proceed  to  assess  the  construct  validity  of 
our  measures. 

Another  theme  of  the  criterion  development  work  will  be  attention  to  the 
accuracy  of  performance  and  effectiveness  measures  used  in  the  research 
project.  Recent  focus  on  accuracy  In  ratings,  In  addition  to  psychometric 
error  In  these  measures  (e.g.,  Bernardln  4  Pence,  1980;  Borman,  1979), 


will  be  attended  to  during  criterion  development  work.  Whenever  feasible. 


ratings  and  other  measures  of  Individual  soldier  effectiveness  will  be 


SPECIFIC  OBJECTIVES 


I.  Gather/analyze  performance  and  related  records  data  on  FY81/82  cohort 
to  aid  the  Task  1  staff  In  evaluating  the  validity  of  the  ASVAB  and 
other  available  predictors. 


2.  Oevelop  a  model  of  soldier  effectiveness,  a  conceptual  definition  of 
across-MOS  soldier  performance  and  worth  to  the  Army. 

3.  Oevelop  rating  scales  and  scale  administration  materials  for  superiors, 
peers,  and  self  to  use  In  evaluating  soldier  effectiveness  on  the 
model 's  dimensions. 

4.  Develop  objective  composite  measures  of  soldier  effectiveness/worth  to 
the  Army. 

5.  Identify  attrition  and  reenllstment  categories,  e.g.,  attrition  for 
disciplinary  reasons,  bars  to  reenllstment,  to  serve  as  outcome  crite¬ 
ria  of  soldier  effectiveness/worth  to  the  Army. 

6.  Oevelop/ldentlfy  in-service  predictors  of  second-tour  soldier  perform¬ 
ance. 

7.  Gather  ratings  and  objective/outcome  criterion  data  on  first  and 
second-tour  cohorts  to  aid  the  Task  1  staff  In  evaluating  the  validity 
of  pre-induction  predictor  measures  and  the  in-service  predictors. 

8.  Obtain  scaled  utilities  for  MOS  performance  levels  and  jobs. 


4-11 


OVERALL  SUMMARY  OF  THE  PROCEDURE 


To  accomplish  these  research  objectives,  the  following  steps  are  required: 

Information  search  and  evaluation  of  Instruments.  We  are  reviewing  the 
performance  measurement  literature  and  examining  performance  measurement 
methods  currently  used  In  the  Army.  Also,  the  Master  Plan  and  Research 
Plan  for  this  task  are  being  prepared. 

Develop  prototype  Instruments.  One  Important  activity  In  this  research  Is 
the  development  of  a  model  of  soldier  effectiveness  which  will  represent  a 
behavioral  definition  of  a  broad  range  of  effectiveness  dimensions  related 
to  across-MOS,  Army-wide  soldier  performance.  From  the  dimensions  deve¬ 
loped  here,  we  will  derive  behavioral  rating  scales  to  help  evaluate 
Individual  soldier  effectiveness  In  Task  4  research.  We  will  also  develop 
composites  of  the  administrative  indices  Intended  to  tap  aspects  of  soldier 
effectiveness  within  the  model.  Finally,  we  will  conduct  exploratory 
research  on  combat  effectiveness  on  the  part  of  Individual  soldiers. 

FY81/82  cohort  records  collection,  data  analyses,  and  report  writing.  We 
will  examine  records  data  available  for  the  FY81/82  cohort  and  collect  data 
on  promising  records  variables  for  a  selected  sample  of  cohort  members. 
Data  will  be  analyzed  to  evaluate  their  usefulness  as  Army-wide  criterion 
measures  against  which  to  assess  the  validity  of  ASVA8  (the  latter,  a 
Task  1  activity).  Also,  we  will  write  draft  and  final  reports  on  results 
of  these  analyses. 


Refine  existing  Instruments.  This  step  involves:  (a)  refining  available 


Arn\y-w1de  criteria  that  appear  potentially  useful  for  soldier  effective¬ 
ness  measurement;  (b)  examining  attrition  and  reenlistment  categories, 
reflecting  different  reasons  for  leaving  the  Army  and  different  classes  of 
reenlistment;  (c)  developing  Army-wide  measures  to  tap  second-tour  soldier 
(NCO)  effectiveness;  and  (d)  developing  in-service  predictors,  first-tour 
criterion  measures  that  can  be  used  to  predict  success  In  second-tour 
performance. 

Prepare  for  and  conduct  field  tests/revise  Instruments.  We  will  conduct  a 
series  of  four  field  tests  on:  (a)  the  first-tour  soldier  effectiveness 
rating  scales  derived  from  the  model,  along  with  the  rating  scale  admini¬ 
strative  package;  (b)  the  administrative  indices  and  composites  of  these 
Indices  Intended  to  measure  aspects  of  soldier  effectiveness;  (c)  the 
second-tour  performance/effectiveness  measures;  and  (d)  the  in-service 
predictors.  For  the  most  part  the  materials  will  be  revised  sequentially, 
with  Improvements  made  in  the  Instruments  and  supporting  materials  after 
each  field  test  period. 

FY83/84  cohort  first-tour  data.  Findings  In  the  field  tests  will  result  in 
revised  versions  of  the  soldier  effectiveness  rating  scales  and  a  scoring 
system  for  the  administrative  Index  measures.  The  scales  will  be 
administered  to  raters  evaluating  soldiers  in  this  cohort  sample,  and 
sample  members  will  also  be  scored  on  the  administrative  Indices,  including 
attrition  and  reenlistment,  as  appropriate.  Criterion  data  will  be 
analyzed  to  evaluate  their  distributions,  reliability,  convergent  and 
discriminant  validity,  and,  where  possible,  accuracy  in  Indexing  "true" 


.»  « 


4-13 


performance  levels.  Me  will  also  prepare  draft  and  final  reports  on  data 
analysis  results. 

Revise  Instruments.  This  step  will  allow  for  a  staff  review  of  the  early 
research  on  Task  4  criteria.  In-depth  discussions  with  ARI  and  Army  offi¬ 
cials  about  the  current  status  of  the  criterion  Instruments,  and  final 
revisions  of  these  Instruments  and  supporting  materials  before  FY83/84 
second-tour  and  FY86/87  first-tour  data  collection. 

FY83/84  cohort  second-tour  and  FY86/87  cohort  first-tour  data.  Me  will 
administer  the  revised  rating  scales  (first  and  second  tour,  as  appro¬ 
priate),  score  sample  members  on  the  administrative  Indices,  and  obtain 
attrition  and  reenllstment  data  on  appropriate  sample  members.  Data  will 
be  analysed  as  before,  with  attention  given  to  distributions  of  scores, 
reliability,  convergent  and  discriminant  validity,  and,  where  possible, 
accuracy  of  these, effectiveness  measures. 

Obtain  scaled  utilities  for  MOS  performance  levels  and  jobs.  The  estab¬ 
lished  procedures  of  multi-attribute  utility  theory  Is  one  approach  that 
can  be  applied  to  develop  a  general  model  for  obtaining  and  scaling 
utilities  that  reflect  the  relative  Importance  of  various  MOS  at  different 
levels  of  performance.  The  model  can  be  implemented  In  the  form  of  an 
Integrated  set  of  programs  run  on  a  microcomputer.  These  programs  present 
Instructions,  stimulus  material,  and  assessment  and  response  procedures  as 
appropriate  to  deriving,  analyzing,  and  maintaining  relative  utility 
measures  fc.'  each  MOS  and  its  corresponding  performance  levels.  The 
software  can  be  used  at  various  data  collection  centers  where  appropriate 


content  area  specialists  and  senior  officers  are  available. 

The  exact  procedures  that  will  be  used  to  obtain  the  scaled  utilities  will 
be  worked  out  through  exploratory  research.  Current  plans  (see  Subtask 
4.9)  call  for  accomplishing  the  scaling  In  four  main  steps: 


(1)  Development  of  performance  construct  measures  (from  those 

.  available  for  each  MOS);  _  ....  _ _ _  _ 

(2)  Development  of  utilities  within  each  MOS; 

(3)  Rescaling  the  utility  of  each  MOS  Into  a  common  scale;  and 

(4)  Assigning  dollar  values  to  the  performance  utility  levels. 

Research  will  be  conducted  prior  to  the  completion  of  each  of  these  steps 
In  order  to  determine  how  best  to  accomplish  them. 

Prepare  final  reports.  Our  staff  will  prepare  draft  and  final  technical 
reports  describing  all  Task  4  research. 

The  next  section  provides  a  detailed  description  of  the  procedural  steps  to 
be  taken  In  the  Task  4  research.  Interrelationships  among  the  research 
subtasks  can  be  seen  In  Figure  4-1. 


4-15 


Figure  4-1  Task  4  Milestone  Chart 


PROCEDURE 


Subtask  1:  Information  Search  and  Evaluation  of  Instruments 

We  are:  (a)  reviewing  the  current  performance/effectiveness  measurement 
system,  (b)  conducting  a  literature  review,  and  (c)_  developing.  the_  research 
plan  and  master  plan. 


Review  current  system.  The  first  activity  In  this  subtask  Involves  a 
review  of  the  current  performance  measurement  system  In  the  Army.  We  are 
especially  concerned  with  administrative  records  presently  kept  on  enlisted 
soldiers.  These  records  may  yield  performance  Information  to  help  Index 
Individual  soldier  effectiveness  In  the  Task  4  research. 

In  order  to  evaluate  their  potential  usefulness  for  providing  Army-wide 
criterion  data,  during  the  first  six  months  of  the  research,  we  reviewed 
the  Enlisted  Evaluation  Report,  the  Enlisted  Master  File  (EMF),  the 
Individual  Enlisted  201  File,  and  other  record  sources  available.  This 
activity  Is  In  preparation  for  Subtask  3  interviews  with  persons  familiar 
with  these  records  sources.  Those  sources  that  appear  promising  In  terms 
of  potential  for  providing  meaningful  criterion  information  will  be  the 
target  of  the  Subtask  3  Interviews. 

Conduct  literature  review.  In  addition  to  reviewing  current  data  sources 
related  to  potential  Task  4  criteria,  we  are  conducting  a  literature  review 
of  books,  articles,  and  technical  reports  relevant  to  Task  4.  The  main 
purpose  of  this  review  is  to  ensure  that  no  criterion  development  method, 


4-17 


! nk»u  w  a  mJwuAji  l  m»  ii'iin. 


was 


measuring  Instrument,  or  research  Ideas  related  to  Task  4  are  overlooked, 
ke  are  reviewing  both  military  and  civilian  research  on  performance 
measurement,  criterion  development  and  all  other  topics  relevant  to  Task 
4.  Performance  dimensions  surfacing  from  the  military  literature  will 
be  forwarded  to  Tasks  2  and  3  as  part  of  a  package  to  be  prepared  In 
September  1983. 

Compilation  and  evaluation  of  cost  information.  Virtually  all  validation 
research  is  faced  with  the  problem  of  translating  basic  Information  about 
Increments  in  validity  Into  Indices  that  are  the  most  meaningful  and 
Interpretable  by  the  organization.  In  this  project,  that  will  be  done  by  a 
variety  of  means.  For  example,  the  graphic  display  of  regression  functions, 
decision  tables  and  expectancy  charts,  and  changes  In  organizational  vari¬ 
ables  such  as  attrition,  reenllstment  rates,  and  frequencies  of  administra¬ 
tion  actions,  e.g.,  disciplinary  actions,  will  be  used  where  appropriate. 

It  should  also  be  possible  to  attach  cost  figures  to  a  number  of  outcomes 
that  are  part  of  the  Project  A  Arn\y-w1de  criterion  assessment  efforts. 
While  we  do  not  have  the  resources  to  do  extensive  cost  analyses  as  part  of 
this  project,  It  will  be  possible  to  examine  Information  that  currently 
exists  or  to  make  use  of  cost  estimation  methods  already  developed  by  Arn\y 
managers  and  other  scientists  within  AR I  -  Consequently,  at  about  the  same 
time  that  we  begin  to  develop  Information  on  archival  criterion  Information 
we  will  conduct  Interviews  with  the  relevant  staff  personnel  to  determine 
what  Information  is  available  on  the  following: 

1.  Per  person  training  costs  for  different  MOS. 

2.  Recruiting  costs  per  Individual. 


4-18 


3.  Costs  associated  with  various  categories  of  attrition. 

4.  Costs  associated  with  various  administrative  actions. 

If  some  reasonable  set  of  cost  figures  can  be  compiled  from  the  above  then 
;t  will  be  possible  first  to  translate  validity  coefficients  Into  expec¬ 
tancy  tables  and  then  to  attach  cost  figures  to  the- reduction  In  prediction 
errors  that  Is  achieved  using  a  particular  set  of  predictors,  or  prediction 
algorithm.  These  cost  figures  can  then  be  aggregated  over  whatever  period 
of  time  Is  deemed  appropriate.  Schmidt,  Hunter,  and  Pearlman  (1982)  have 
convincingly  argued  that  In  a  large  organization  even  small  Increments  In 
validity  can  yield  enormous  cost  savings.  Depending  on  how  complete  they 
are,  the  cost  figures  can  also  be  used  In  the  development  of  dollar  equiva¬ 
lencies  for  performance  utility. 

Oevelop  research  plan  and  master  plan.  Finally,  within  this  subtask,  we 
are  preparing  the  Task  4  Research  and  Master  Plans.  The  Research  Plan 
describes  all  proposed  project  activities,  rationale  for  each  activity, 
troop  support  requirements  for  the  research,  and  the  scientific  and  opera¬ 
tional  outcomes  anticipated  from  the  research.  The  Master  Plan  details  the 

( 

project  staff  resources  planned  for  each  research  activity  along  with  the 
travel  and  other  direct  costs  projected  for  each  of  these  activities. 

Subtask  2:  Develop  Prototype  Instruments 

Tnls  subtask  involves  two  major  activities:  (a)  developing  the  soldier 
effectiveness  model,  and  (b)  generating  composites  of  administrative  in¬ 
dices.  Model  development  steps  will  be  described  Tirst  and  work  on  the 
composites  will  be  detailed  afterwards. 


4-19 


Development  of  the  soldier  effectiveness  model.  Development  of  the  model 
will  Involve  primarily  an  Inductive  process,  consisting  of  group  discus¬ 
sions  and  workshops  with  NCO  and  officers  within  what  has  been  referred  to 
as  the  critical  Incidents  or  behavioral  analysis  method  (Flanagan,  1954; 
Smith  &  Kendall,  1963).  In  the  workshops,  we  will  ask  for  critical  Inci¬ 
dents  or  performance  examples  describing  relatively  effective  (as_well  as 
Ineffective)  behavioral  patterns  among  first-tour  enlisted  personnel  in  a 
wide  range  of  MOS.  The  NCO/offlcer  participants  will  be  given  special 
guidance  to  provide  examples  that  could  occur  in  any  MOS,  such  as  the 
following: 

o  When  on  a  regular  work  schedule,  this  soldier  consis¬ 
tently  reports  for  work  15  minutes  early  and  asks  the 
first  sergeant  If  there's  anything  he  can  get  started 
on. 

o  This  soldier  picked  on  a  fellow  unit  member  by  Intimi¬ 
dating  him  in  the  barracks  In  front  of  several  other 
soldiers. 

We  will  also  ask  members  of  the  workshop  groups  to  tell  us  about  general 
behavioral  patterns  that  they  take  into  consideration  when  thinking  about  a 
soldier's  overall  contributions  to  the  Army.  We  will  ask,  for  instance, 
how  they  recognize  soldiers  whose  first-tour  performance  Indicates  that 
they  should  be  encouraged  to  reenlist  for  a  second  tour  and  other  soldiers 
who  should  not  be  encouraged  or  who  should  be  prevented  from  reenlisting. 

In  the  behavioral  analysis  workshops,  a  brief  orientation  program  (Borman, 
Hough,  &  Dunnette,  1976),  will  be  used  to  train  participants  In  generating 
and  writing  useful  behavioral  examples.  In  the  program,  the  workshop 


leaders  win  describe  the  nature  of  behavioral  examples,  discuss  wny  gener¬ 
ating  behavioral  examples  leads  to  development  of  meaningful  behavior-based 
definitions  of  Individual  effectiveness,  and  provide  participants  with 
examples  of  Improperly  and  properly  written  behavioral  examples/critical 
Incidents  (e.g.,  Borman  et  al ,  1976).  Workshop  participants  will  then  be 
Instructed  to  begin  writing  performance  examples,  and  the  two  .research 
staff  members  conducting  the  workshop  will  help  members  of  the  group  to 
ensure  they  are  on  the  right  track.  After  each  participant  ha  generated 
four  or  five  examples,  we  will  stop  the  group  and  discuss  with  members  the 
preliminary  model  of  soldier  effectiveness  that  appears  in  the  proposal. 
This  model  contains  elements  of  organizational  commitment  (Steers,  1977), 
organizational  socialization  (Van  Maanen  &  Scheln,  1979),  and  morale 
(Motowldlo  &  Borman,  1977).  The  preliminary  model  Is  Intended  to  suggest 
constructs  that  might  be  considered  as  reflecting  soldier  effectiveness/ 
worth  to  the  Army. 

Our  staff  will  seek  opinions  about  components  of  the  model  and  possible 
other  domains  that  might  be  included.  However,  at  no  time  will  we  force 
this  model  on  participants.  It  will  be  used  only  to  stimulate  discussion 
about  the  possible  soldier  effectiveness  domains  that  might  be  Important  to 
Include  In  the  model.  After  the  discussion  of  domains,  participants  will 
be  asked  to  continue  writing  more  examples  of  effective  and  Ineffective 
soldier  behavior,  targeted  toward  domains  they  believe  to  be  Important  for 
overall  soldier  effectiveness/worth  to  the  Army.  The  output  from  these 
workshops  will  be  several  behavioral  examples  from  each  participant. 


i 

f  ■ ■■ .  ■  ‘ 

•  The  examples  will  be  edited  Into  a  cormnon  format  and  content-analyzed  to 

I 

form  dimensions  of  Army-wide  performance/effectiveness.  At  least  two  Task  4 

!  researchers  will  review  the  many  edited  behavioral  examples  (approximately 

i 

h  1000)  and  develop  categories  or  dimensions  of  effectiveness  reflecting  the 

I 

|  content  of  the  examples. 

After  this  step,  a  "retranslation"  process  (Smith  £  Kendall,  1963)  provides 

j  a  method  for  checking  on  the  dimensions  and  also  a  good  empirical  procedure 

for  developing  behavior-based  scales  to  define  this  performance/effective¬ 
ness  domain.  Specifically,  we  will  present  our  dimensions  to  the  COR  and 
others  he  designates  to  evaluate  this  depiction  of  soldier  effectiveness. 
At  this  point  we  will  also  arrange  to  have  the  dimensions  reviewed  by 
senior  field  grade  officers  to  ensure  that  we  have  adequately  captured  the 
domain  of  soldier  effectiveness.  We  will  Incorporate  suggestions  for 
changes  before  moving  to  the  retranslation  process.  Once  the  revisions  are 
accomplished,  we  will  submit  all  edited  behavioral  examples  to  the  workshop 
participants  (by  mall),  along  with  the  revised  dimensions  for  the  retrans¬ 
lation  step. 

Briefly,  with  retranslatlon,  participants  In  this  step  will  sort  each 
example  Into  one  of  the  dimensions  according  to  Its  perceived  content  and 
also  rate  the  effectiveness  level  of  the  behavior  reflected  in  the  example, 
e.g.,  from  1  •  very  Ineffective  to  7  *  very  effective.  Retranslatlon 
provides  a  confirmation  (or  dlsconfl rmatlon)  of  the  dimension  system  based 
on  participant  agreement  on  this  sorting  task.  It  also  provides  data  on 
the  degree  of  ambiguity  In  the  behavioral  examples'  effectiveness  levels, 
so  that  examples  with  good  agreement  In  the  effectiveness  they  represent 
can  be  used  as  behavioral  anchors  for  the  soldier  effectiveness  dimensions. 

4-22 


12  * 


Retranslation,  then,  results  In  a  series  of  dimensions,  each  anchored  by 
scaled  behavioral  examples.  The  content  of  successfully  retranslated 
behavioral  examples  on  each  scale  Is  summarized,  separately  for  the  effec¬ 
tive  and  Ineffective  portions  of  the  scale.  The  final  behavior  summary 
scales  should  reflect,  therefore,  a  dear,  behavior-based  depiction  of  the 
Important  soldier  effectiveness,  dimensions  (see  Figure  4-2  for  an  example 
behavioral  d1mens1on--for  the  job  of  Navy  recruiter).  These  dimensions 
form  the  content  of  the  model  of  soldier  effectiveness.  In  addition  to 
representing  a  very  Important  product  for  Task  4  research,  the  dimensions 
are  Important  for  Task  2  predictor  development  work. 

Because  Task  2  Subtask  4)  requires  the  dimensions  In  October,  1983  In  order 
to  judge  the  usefulness  of  potential  predictors,  we  will  forward  early  ver¬ 
sions  of  them  to  the  Task  2  researchers  according  to  their  specifications 
and  In  time  for  their  October  technical  review  procedure.  The  package  for 
Task  2  will  Include  dimensions  discovered  In  the  literature  review  as  well 
as  dimensions  resulting  from  early  workshops  in  the  model  development 
effort.  For  each  dimension  In  the  package,  we  will  Include  a  name  or 
label,  a  brief  definition  of  the  dimension  and  any  additional  explanation 
related  to  our  view  of  Its  meaning,  and  the  source  from  which  the  dimension 
was  derived.  We  will  also  send  this  package  to  Task  3. 

Details  of  troop  support  for  model  development.  A  total  of  96  NCO  and 
field  grade  officers  Is  needed  to  participate  in  six  one-day  workshops  (16 
at  each).  We  believe  NCO  and  field  grade  officers  will  be  most  appropriate 
In  these  and  other  Task  4  workshops  because,  typically,  they  have  consider¬ 
able  experience  In  the  Army,  while  also  being  reasonably  close  to  the 
day-to-day  operations  of  Army  units  and  personnel.  The  NCO/offlcers  should 


4-23 


Figure  4-2 

Behavior  Summary  Scale  for  Job  of  Navy  Recruiter 


4-24 


be  selected  to  ensure  that  they  are  Interested  and  willing  to  participate. 
The  mix  of  MOS  and  officer  specialties  reflected  In  the  groups  Is  not 
particularly  Important,  as  long  as  a  wide  variety  of  MOS/speclaltles  Is 
represented  across  the  six  participant  groups.  This  configuration  should 
prevent  the  model  from  being  a  narrow,  parochial  definition  of  Army-wide 
soldier  effectiveness. 

The  number  of  participants  was  carefully  considered  and  should  lead  to 
generation  of  the  proper  number  of  behavioral  examples.  Based  on  our 
experience  with  behavioral  scale  development,  800-1000  examples  are 
required  to  sample  the  performance  domain  sufficiently  to  develop  rich 
behavioral  definitions  of  the  domain  (Borman,  Hough,  &  Dunnette,  1976). 
Also  our  experience  shows  that  each  of  the  96  participants,  working  with  us 
for  one  day,  should  be  able  to  generate  about  10  usable  examples  (Borman, 
Johnson,  Motowldlo,  4  Dunnette,  1975). 

The  total  time  required  of  each  NCO/offlcer  will  be  one  and  one-half  days. 
He/she  must  attend  the  one-day  workshop  to  generate  behavioral  examples  and 
later  must  respond  to  the  retranslatlon  task,  which  requires  reviewing  our 
dimensions  identified  for  the  model  and  making  ratings  of  effectiveness 
levels  that  were  suggested  by  each  behavioral  example  written  In  the 
workshops.  The  review  and  ratings  should  take  one-half  day  of  his/her 
time. 

Recommended  locations  for  the  workshops  are  Forts  Hood,  Knox,  Bennlng,  and 
Carson.  This  site  selection  should  result  in  a  representatl ve  mix  of 
MOS/offlcer  specialties  with  relatively  little  travel  on  the  part  of 
participants. 


4-25 


Proposed  timing  of  the  workshops  Is  as  follows: 

o  Last  two  weeks  In  July,  1983:  Workshops  1  and  2  (Hood) 

o  First  two  weeks  In  Sept,  1983:  Workshops  3  and  4  (Knox  &  Banning) 

o  Last  two  weeks  In  Sept,  1983:  Workshops  5  and  6  (Carson) 

It- Is  Important  to- have  one  month  between  the  flrst  two  workshops  and  the 
later  workshops  to  enable  the  Task  4  staff  to:  (a)  examine  the  Initial 
behavioral  examples  to  ensure  that  they  meet  project  requirements,  and  (b) 
form  preliminary  dimensions  based  on  the  content  of  these  examples.  As 
mentioned,  these  dimensions  and  definitions  will  be  forwarded  to  Tasks  2 
and  3. 

The  schedule  presented  should  allow  sufficient  time  for  this  dimension 
development  effort.  At  the  same  time,  the  schedule  Is  "tight"  to  ensure 
efficient  use  of  staff  and  relatively  quick  completion  of  the  model 
development  steps. 

Another  activity  related  to  model  development  Is  preparation  of  (a) 
rating  scales  based  on  the  model,  and  (b)  a  rating  scale  administration 
package  to  aid  In  gathering  rating  data.  Developing  the  rating  format  Is 
very  straightforward.  The  dimensions.  Including  behavioral  definitions, 
emerge  directly  from  the  model  development  steps,  and  therefore  no  addi¬ 
tional  work  Is  required  to  ready  the  rating  scales. 

For  the  administration  package,  we  envision  developing  Instructions  that 
enable  raters  to  complete  their  ratings  with  maximum  ease  and  minimum 
confusion.  We  will  use  our  past  experience  In  studies  Involving  ratings 


(e.g.,  Borman  &  Dunnette,  1975;  Campbell,  Dunnette,  Arvey,  4  Hellervlk, 
1973)  to  prepare  the  best  Instructions  possible,  and  then  research  will  be 
conducted  during  field  tests  (described  later  In  this  plan)  of  the  rating 
form  and  procedures. 

Also  to  be  developed  Is  a  rater  training  module  to  help  raters  make  more 
accurate  evaluations.  As  with  the  rating  scale  Instructions,  we  will 
develop  what  we  believe  to_  be  the  best  training  module  possible  using  our 
experience  in  past  research  (e.g.,  Borman,  1975;  Peterson,  Houston,  & 
Rosse,  1981).  Briefly,  the  module  we  have  worked  with  describes  three 
different  rating  errors  -  halo,  stereotype,  lenlency/restrlctlon-of-range  - 
and  Instructs  raters  on  the  use  of  behavioral  definitions  when  making  their 
evaluations.  This  Is  presented  on  one  or  two  pages  and  is  easy  for  the  lay 
person  to  understand.  The  trainer  also  discusses  these  errors  In  common 
sense  terms,  assures  the  raters  that  the  evaluations  will  be  for  research- 
only  purposes,  and  answers  questions  about  the  rating  form  and  the  project 
in  general.  As  with  the  administration  package,  we  will  conduct  field 
tests  of  this  and  similar  modules  to  Improve  further  the  rater  training 
component. 


Pilot  development  of  special  Combat  Performance  Prediction  Scales.  The 
major  rating  scale  development  work  In  this  subtask  will  focus  on  the  model 
development  steps  Just  detailed.  However,  we  plan  an  exploratory  investi¬ 
gation  to  determine  the  feasibility  of  constructing  Combat  Performance 
Prediction  Scales  that  might  be  used  to  predict  performance  In  combat. 
Such  scales  would  be  designed  to  evaluate  expected  performance  under  the 
degraded  environmental  conditions,  communication,  support,  etc.,  and  the 


Increased  confusion,  workload,  and  uncertainty  of  a  combat  environment. 
Such  conditions  would  be  expected  for  many  soldiers  near  a  battle  area, 
even  though  It  Is  likely  that  only  a  small  percentage  of  the  total  Army 
force  will  directly  participate  In  combat.  Thus,  soldier  effectiveness 
under  combat  conditions  represents  a  potentially  Important  part  of  the 
total  soldier  effectiveness  domain. 

Still  another  reason  for  considering  this  an  exploratory  effort  and  for 
concentrating  on  the  soldier  effectiveness  domains  described  previously, 
l.e.,  across-MOS  performance,  organizational  commitment  and  socialization, 
etc..  Is  that  we  may  be  asking  raters  to  perform  a  rating  task  they  are 
Incapable  of  doing  well.  With  these  scales,  we  will  be  asking  raters  to 
evaluate  the  likely  performance  of  individual  soldiers  on  dimensions  rele¬ 
vant  to  a  combat  situation.  This  requires  considerable  Inference  on  the 
part  of  the  raters,  because  we  are  asking  that  they  observe  garrlson/fleld 
performance  and  effectiveness  and  provide  estimates  of  effectiveness  In  a 
very  different  setting. 

Nonetheless,  we  plan  to  conduct  this  exploratory  work  In  an  attempt  to  form 
dimensions  of  combat  parformance  and  to  develop  the  Combat  Performance 
Prediction  Scales.  These  scales,  like  the  soldier  effectiveness  model 
dimensions,  will  be  appropriate  for  any  HOS.  The  dimensions  will  be  devel¬ 
oped  in  three  one-day  workshops  with  a  total  of  30  senior  NCOs  and  field 
grade  officers  participating  in  each  session  (10  In  each).  These 
participants  should  have  experience  in  combat  environments. 


In  each  workshop,  two  Task  4  researchers  will  describe  the  objectives  of 
the  session  and  then  ask  participants  to  consider  effective  and  Ineffective 
soldier  performance  in  combat.  Participants  will  be  encouraged  to  describe 
to  the  group  what  kinds  of  erformances  seem  to  differentiate  those  who 
prove  to  be  effective  In  combat  and  those  who  are  less  effective.  During 
the  group  discussion  we  will  guide  participants  toward- Identifying  and 
defining  dimensions  of  combat  performance.  In  the  second  and  third  work¬ 
shops,  we  will  also  present  to  the  groups  what  we  have  gleaned  from  earlier 
sessions.  Participants  will  be  asked  to  comment  on  the  dimension  names  and 
definitions  and  to  make  revisions  as  needed.  The  product  to  emerge  from 
this  series  of  sessions  will  be  a  set  of  well-defined  dimensions  that  can 
be  used  as  "predicted  performance"  rating  scales  to  predict  the  combat 
performance  of  Individual  soldiers. 

The  three  workshops  should  be  conducted  at  the  Pentagon.  As  before,  work¬ 
shop  participants  should  be  selected  to  ensure  they  are  interested  in 
the  project  and  well  motivated  to  participate.  We  strongly  suggest  10 
participants  In  each  workshop  because  this  Is  a  small  enough  number  to  get 
everyone's  polnt-of-vlew  and  yet  large  enough  to  allow  (across  the  three 
workshops)  reasonably  good  representation  In  terms  of  specialties. 

Development  of  administrative  Index  composites.  Regarding  this  development 
work.  Subtask  1  activities  should  yield  candidate  Indices  for  these  com¬ 
posites.  Interviews  conducted  In  Subtask  3  will  suggest  which  Indices 
appear  most  promising  for  further  examination,  and  the  preliminary  analyses 
of  records  data,  also  accomplished  In  Subtask  3,  should  provide  more  defin¬ 
itive  guidance  on  the  usefulness  of  Individual  administrative  Indices. 


However,  as  argued  In  our  proposal,  we  see  considerable  merit  in  forming 
composites  of  selected  Indices  to  tap  elements  of  the  soldier  effectiveness 
model,  and  work  In  Subtask  2  is  Intended  to  accomplish  this  objective. 

Briefly,  the  argument  Is  this.  There  Is  a  serious  difficulty  In  using 

administrative  records- as  soldier  effectiveness  criteria  since  they  often — 
reflect  exceptionally  good  or  exceptionally  poor  performance.  These 
records,  therefore,  have  low  base  rates,  l.e.,  they  appear  Infrequently  In 
a  soldier's  records,  and  very  little  variance,  l.e.,  everyone  has  about  the 
same  "scores"  on  them.  Consider,  for  example,  AWOL  on  the  poor  performance 
side  and  special  awards  on  the  good  performance  side.  This  skewness 
seriously  constrains  the  usefulness  of  an  administrative  variable  as  a 
criterion  of  soldier  effectiveness  (cf.  Hammer  &  Landau,  1981). 

One  strategy  for  dealing  with  the  problem  of  low  base  rates  Is  to  combine 
records  of  different  kinds  of  events  and  actions  Into  more  general 
Indices.  An  approach  to  doing  this  Is  to  examine  patterns  of  correlations 
between  different  records--prov1ded,  of  course,  that  there  Is  enough 
variance  to  permit  at  least  some  patterns  of  covariation  to  emerge—and 
combine  those  that  are  empirically  related  in  this  fashion.  A  second  way, 
which  might  still  be  possible  even  though  empirical  relationships  between 
records  cannot  be  detected,  is  to  combine  different  elements  that  are 
Judged  to  be  conceptually  similar.  Quite  possibly,  for  example,  it  may 
become  clear  that  several  kinds  of  awards  should  be  combined  Into  one  Index 
because  they  all  Indicate  organizational  recognition  for  outstanding  per¬ 
formance  In  some  psychologically  homogeneous  behavioral  domain.  Then  a 
soldier's  "score"  would  be  the  total  number  of  awards  received  In  that 
particular  category. 


How  this  combining  of  Individual  objective  Indices  might  offer  a  good 
approach  can  be  explored  using  data  recently  presented  by  Shields,  Hanser, 
Williams,  and  Popelka  (1981).  They  gathered  Information  on  soldier  effec¬ 
tiveness  in  the  193rd  Infantry  Brigade,  Panama.  Data  were  collected  on 
such  variables  as  SQT  scores,  number  of  awards,  number  of  military  courses 
completed,  number  of  times  honor_graduate  status  was  attained  In  training 
courses,  number  of  Article  15,  and  number  of  letters  of  appreciation.  One 

4 

result  of  the  research  was  that  positive  correlations  emerged  between  some 
criterion  pairs— for  example,  SQT  scores  and  number  of  awards,  r«,43;  num¬ 
ber  of  awards  and  number  of  military  courses  completed,  r«.63;  etc.  This 
Indicates  that  these  different  indices  may  Indeed  reflect  to  some  extent  an 
underlying  effectiveness  construct.  Of  course,  relationships  between  other 
pairs  of  Indices  are  low,  but  what  we  suggest  here  Is  that  low  base  rates 
may  he  an  Important  contributor  to  the  low  correlations  In  some  cases.  For 
example,  "number  of  times  honor  graduate  status  attained"  has  a  mean  of  .03 
across  some  125  soldiers,  and  this  low  base  rate  reduces  the  likelihood  of 
substantial  correlations  between  this  variable  and  other  variables. 
However,  If  scores  on  this  measure  are  combined  with  scores  on  other  low 
base  -?te  but  conceptually  similar  measures,  l.e.,  measuring  what  appears 
t  •  .•  -he  same  underlying  construct,  the  base  rate  might  well  Improve  to  a 
level  where  significant  correlations  with  other  variables  would  be  much 
more  likely. 

A  final  Issue  here  Is  the  problem  of  weighting  different  records  so  that 
the  more  Important  or  organizationally  significant  pieces  of  Information 
will  play  a  heavier  role  In  determining  a  soldier's  total  score  on  an 
aggregated  Index.  We  must,  essentially,  seek  to  determine  how  many 


"points"  a  soldier  should  get  for  each  of  several  kinds  of  awards,  for 
example,  and  how  many  "points"  should  be  taken  away  for  AWOL,  Article  15, 
and  different  punishments.  We  hope,  then,  to  develop  a  rationale  for  con¬ 
structing  one  or  more  weighted  Indices  of  soldier  effectiveness  In  the  con¬ 
text  of  Army-wide  performance.  Indices  generated  In  this  manner  will 
probably  show  more  var lance  than  the  raw  Individual  records  that,  make  them 
up,  and  they  should,  therefore,  be  more  useful  as  criterion  measures  of 
Army-wide  performance. 

To  accomplish  this  step,  we  will  prepare  for  and  then  conduct  cwo  different 
one-day  workshops  with  officers  and  NCOs  (details  of  troop  support  require¬ 
ments  follow).  In  preparation  for  these  sessions  we  will  develop  a  brief¬ 
ing  package  that  describes  in  lay  terms  the  statistical  strengths  and  weak¬ 
nesses  of  each  candidate  effectiveness  Index.  Base  rate  and  missing  data 
Information,  correlations  between  Indices,  when  available,  and  any  ether 
kind  of  Information  from  the  Subtask  3  analyses  of  records  data  will 
be  put  In  a  form  that  will  give  workshop  participants  a  good  picture  of 
each  variable's  usefulness  and  how  the  variables  relate  to  each  other 
empirically. 

Specifically,  two  research  staff  members  will  Introduce  the  mission  for 
the  workshop  series  to  form  one  or  more  composites  reflecting  Important 
constructs  of  soldier  effectiveness.  Then  the  staff  will  explain  the  pack¬ 
age  describing  statistical  properties  of  the  candidate  Indices  and  begin  a 
discussion  of  what  klnd(s)  of  composites  might  be  formed  to  tap 
Important  aspects  of  the  soldier  effectiveness  domain. 


4-32 


The  main  ides  for  these  workshops  Is  that  the  participants  will  have 
considered  opinions  about  the  meaning  of  different  objective  Indices  and 
the  Importance  of  each  (conceptually)  as  an  Index  of  some  aspect  of  soldier 
effectiveness,  and  we  as  psychologists  will  have  good  knowledge  of  the 
measurement  properties  of  these  candidate  Indices.  These  appear  to  be 
exactly  the  two  kinds  of-  Information- and  knowledge  necessary  to  make  good 
judgments  about  forming  the  composite  or  composites. 

Thus,  we  envision  the  officer  and  NCO  participants  offering  their  views  on 
the  measures  to  go  into  the  composlte(s)  and  the  Task  4  staff  (after 
seeking  counsel  from  Task  1  researchers)  providing  statistical  guidance  to 
ensure  that  the  composlte(s)  Is  formed  on  a  reasonably  sound  psychometric 
basis.  Workshop  participants  will  be  asked  also  to  provide  weights  for  the 
Individual  elements  of  the  composlte(s),  again,  based  on  a  combination  of 
conceptual  and  statistical  considerations.  The  final  composites  and 
weights  will  be  formed  according  to  a  consensus  of  the  final  opinions 
expressed  In  each  of  the  three  workshops. 

In  sum,  the  output  from  the  workshop  series  will  be:  (a)  labels  and 
conceptual  definitions  of  one  or  more  composites  targeted  toward  measuring 
one  or  more  Important  Army-wide  criterion  constructs;  and  (b)  member  objec¬ 
tive  Indices  for  each  composite,  along  with  weights  assigned.  All  Informa¬ 
tion  regarding  these  Indices  will  be  forwarded  to  Task  2  scientists. 

Details  of  troop  support  for  development  of  administrative  Index  compos¬ 
ites.  A  total  of  20  NCO  and  field  grade  officers  (10  of  each)  will  be 


assigned  to  participate  in  two  one-day  workshops.  As  with  the  behavioral 


analysis  workshops,  participants  should  be  selected  to  Include  well  moti¬ 
vated  and  knowledgeable  officers  and  NCOs  from  a  variety  of  specialties  and 
types  of  units.  This  is  to  preclude  obtaining  a  narrow  view  of  these 
administrative  measures.  We  recommend  the  number  10  In  each  workshop 
because  this  Is  a  small  enough  group  to  ensure  everyone  Is  heard  from,  and 
yet  large  enough  to  allow  (across,  the  two  workshops)  .reasonably  broad 
representation  In  terms  of  specialties. 

The  workshops  should  be  conducted  at  the  U.  S.  Arny  In  Europe  In  October, 
1983.  At  that  point  we  will  have  learned  enough  from  Subtask  3  records 
research  to  prepare  the  briefing  materials  referred  to  previously. 

Subtask  3:  FY81/82  Cohort  Records  Collection.  Data  Analyses,  and  Report 
Writing 

We  will:  (a)  examine  records  data,  (b)  collect  records  data  on  the  FY81/82 
cohort,  (c)  perform  data  analyses  on  FY81/82  cohort  records,  and  (d)  pre¬ 
pare  draft  and  final  reports  on  FY81/82  cohort  analyses. 

Examination  and  collection  of  records  data.  The  first  two  activities 
In  this  subtask  Involve  the  examination  and  collection  of  records  Infor¬ 
mation  on  the  FY81/82  cohort.  Accordingly,  we  propose  to  examine  and 
collect  records  data  from  the  Enlisted  Master  File  (EMF),  the  Individual 
Enlisted  201  File,  and  any  other  record  sources  available  as  Identified  In 


Subtask  1  to  evaluate  their  usefulness  for  providing  Army-wide  criterion 
data.  To  accomplish  this,  we  will  conduct  20  two-hour  structured  Inter¬ 
views  with  persons  familiar  with  one  or  more  records  source.  The  focus  of 
these  Interviews  will  be  on  the  state  of  the  data  (How  much  missing  data, 
extent  and  types  of  error  In  data,  etc.,  exist?)  and  on  the  meaning  of  the 
Information  (How  precisely,  is  .each  .attrition  category  defined?).  Task  4 
researchers  have  already  conducted  several  of  these  Interviews  at  FORSCOM 
headquarters.  Ft.  McPherson,  and  with  MILPERCEN  officials  In  their 
Alexandria,  Virginia  location. 

Data  analyses.  Another  very  Important  step  In  assessing  the  state  of 
available  records  Is  to  perform  preliminary  data  analyses  on  the  FY81/82 
cohort  to  determine:  (a)  amount  of  missing  data;  (b)  base  rates  of  the 
variables  we  are  concerned  with  In  the  research;  (c)  psychometric  charac¬ 
teristics  of  the  measures,  e.g.,  means,  standard  deviations,  across-tlme 
reliability,  etc.;  and  (d)  possible  serious  discrepancies  between  MOS  In 
base  rates,  means,  and  standard  deviations  of  measures  (as  appropriate) . 
The  general  Idea  Is  to  check  closely  on  the  ''quality"  of  the  data.  High 
rates  of  missing  data,  very  uneven  base  rates,  poor  across-tlme  reliability 
where  we  would  expect  consistency,  and  large  differences  between  MOS  In 
average  "scores"  on  criterion  measures  will  be  cause  for  concern.  However, 
If  one  measure  proves  to  be  a  problem  In  this  regard  It  Is  possible  that 
another  measure  In  the  same  domain  or  a  similar  domain  might  be  substi¬ 
tuted.  Whatever  the  exact  outcome  here,  the  Intent  Is  the  same  as  with  the 
Interviews  conducted  In  Subtask  1  -  become  as  knowledgeable  as  possible 
about  the  data  available  on  Army-wide  criteria  so  that  subsequent  analysis 
results  are  properly  Interpreted. 


Specifically,  as  stated  In  the  proposal,  we  plan  to  perform  the  above 
Initial  and  exploratory  data  analyses  on  the  following  measures:  attri¬ 
tion,  Including  different  categories  of  the  variable,  e.g.,  medical,  drug- 
related,  etc.;  reenl Istment,  Including  bars  and  reenlistment  choice; 
Article  15  and  courts-martial;  promotions;  school  selection;  and  AWOL. 
Visits  to  Ft.  McPherson  (FORSCOM)  and  MILPERCEN  have  revealed  that  Informa¬ 
tion  on  attrition,  reenllstment  eligibility,  and  bat's  to  reenllstment  Is 
available  from  the  EMF.  Information  on  awards.  Article  15,  letters  of 
commendation,  etc..  Is  not,  however,  available  from  the  EMF.  This  Infor¬ 
mation,  which  exists  on  microfiche  for  all  enlisted  personnel.  Is  centrally 
located  at  Ft.  Benjamin  Harrison,  Indiana.  Additionally,  data  analyses 
will  be  performed  on  any  other  Index  available  In  the  records,  as  Identi¬ 
fied  in  the  structured  Interviews,  and  which  might  be  an  Indicator  of 
Important  components  of  soldier  effectiveness. 

Thus,  we  will  visit  Ft.  Harrison  to  review  400  microfiche  records  on 
Individual  soldiers  to  help  evaluate  the  usefulness  of  these  records. 
Analyses  will  be  performed  to  evaluate  base  rates,  the  amount  of  missing 
data,  etc.  What  Is  learned  from  these  analyses  can  be  applied  both  to 
research  with  the  FY81/82  cohort  and  to  subsequent  work  on  objective 
measure  composites. 


r~ 

JH 


After  the  records  data  have  been  examined  and  decisions  made  concerning  — 

the  most  promising  variables,  we  will  actually  score  each  member  of  the 
FY81/82  cohort  sample  (all  those  cohort  members  selected  for  the  validation 
research)  on  each  of  these  variables.  These  scores  will  then  serve  as 
criterion  data  In  the  ASVAB  validation  work  to  be  conducted  by  Task  1 

s  . 

scientists. 


4-36 


After  the  criterion  records  data  have  been  added  to  the  LROB  by  Task  1 
staff,  we  will  perform  appropriate  analyses  on  the  data.  These  are  likely 
to  Include  estimates  of  reliability  where  possible,  correlations  between 
variables,  and  factor  analysis  work  to  evaluate  the  criterion 
dimensionality. 


Prepare  draft  and  final  report  on  FY81/82  cohort  data  analyses.  Finally, 
within  this  subtask,  draft  and  final  reports  on  the  FY81/82  cohort  analyses 
will  be  prepared.  These  reports  will  summarize  the  findings  from  data 
analyses,  and  as  needed,  make  recommendations  for  Improving  the  collection 
and  recording  of  objective  Indices  of  soldier  effectiveness. 

Subtask  4:  Refine  Existing  Instruments 

We  will:  (a)  suggest  revisions  In  Army-wide  criterion  measures  Identified 
In  Subtasks  1,  2  and  3  In  an  effort  to  Improve  them,  (b)  examine  categories 
of  attrition  and  reenlistment  criteria  to  Improve  predictor-criterion 
match-ups,  (c)  develop  second-tour  performance  measures,  and  (d)  develop 
In-service  predictors. 

Revising  existing  Army-wide  criteria.  First,  our  staff  will  analyze  care¬ 
fully  the  problems  with  Army-wide  criteria  discovered  In  Subtasks  1,  2  and 
3  of  Task  4,  and,  as  appropriate,  suggest  Improvements  In  the  measures. 
Refinement  of  Army -wide  measures  may  take  a  number  of  different  forms 
depending  upon  Identified  requirements  and  the  instruments  or  procedures 
concerned.  Refinements  might  involve  nothing  more  than  changes  in  records 


4-37 


forms  or  reporting  requirements  for  personnel  actions  to  Improve  the 
quality  of  obtained  data.  On  the  other  hand,  refinement  could  require  the 
revision  of  forms,  Instruments,  or  procedures,  with  significant  Impact 
ultimately  upon  current  administrative  procedures.  Throughout,  potential 
Impact  upon  administrative  procedures  and  requirements  will  be  an  Important 
consideration.  The  objective  will  be  to  avoid.  If  possible,  or  to  mini¬ 
mize:  (a)  potential  changes  In  administrative  or  reporting  requirements 
that  might  cause  problems  for  operating  units, and  (b).  additional  effort  by 
personnel  of  operating  units.  The  main  point  of  this  activity,  however.  Is 
to  refine  forms.  Instruments,  and  procedures  so  that  administrative  Index 
data  are  of  the  highest  quality  possible. 

Me  propose  to  Interview  20  persons  knowledgeable  about  administrative 
records  during  the  Subtask  3  research.  Within  Subtask  4,  It  will  be 
Important  to  return  to  5  of  these  same  Individuals  to  review  with  them 
our  suggested  revisions  to  records  forms.  Instruments,  and  procedures  for 
collecting  administrative  data  (we  recommend  January  -  February  1984). 
These  reviews  are  to  evaluate  the  practicality  of  our  suggested  revisions. 
There  Is  no  Intent  by  the  contract  researchers  to  Institute  these  changes. 
The  above  mentioned  activity  Is  meant  to  provide  Information  that  might 
generate  recommendations  for  more  efficient  and  productive  collection  of 
administrative  records  data. 

Examine  categories  of  attrition  and  reenll stment.  A  second  effort  In  this 
subtask  Involves  studying  categories  of  attrition  and  reenllstment  to 
develop  more  homogeneous  criteria.  Earlier,  we  discussed  the  notion  of 
creating  more  conceptually  sensible  predictor-criterion  linkages  by 
reducing  the  global  attrition  criterion  to  a  series  of  criteria,  each  one 


4-38 


homogeneous  In  terms  of  the  likely  reasons  for  the  outcome,  e.g.,  disci¬ 
pline  problems,  medical  reasons,  etc.  With  this  approach,  correlations 
between  pre-induction  predictors  and  categories  of  attrition  are  likely  to 
be  much  higher,  and  make  better  conceptual  sense  than  If  attrition  as  a 
whole  Is  used  as  a  criterion.  Having  spoken  with  personnel  at  Ft.  McPher¬ 
son  (FORSCON)  and"  MILPERCEN,  ”we  have  learned  that  attrition,  reenllstment 
eligibility,  and  bars  to  reenllstment  are  currently  recorded  by  category  on 
the  EMF. 


Thus,  we  plan  to  study  the  attrition  and  reenllstment  categories  to  deter¬ 
mine  the  frequency  with  which  they  are  being  used.  An  example  of  such  a 
category  Is  attrition  for  discipline-related  reasons.  It  Is  likely  that 
pre-induction  predictors  measuring  socialization  and  adjustment  factors  may 
be  successful  In  predicting  this  kind  of  attrition.  Likewise,  other  such 
categories  will  be  examined  using  this  concept  and  these  categories  opera¬ 
tionally  defined  for  tryout  In  the  field  tests.  The  same  approach  will  be 
used  to  evaluate  the  reenllstment  categories.  If,  however,  we  find  that 
particular  categories  are  not  being  used,  we  will  form  composites  of 
categories  in  the  same  manner  that  composites  of  other  administrative 
Indices  are  formed.  For  example,  attrition  due  to  AWOL  and  Article  15  may 
be  combined  to  form  a  conceptually  homogeneous  composite  called  Disci¬ 
plinary  Attrition. 

Developing  second-tour  performance  measures.  A  third  activity  In  this 
subtask  is  developing  second-tour  performance  measures  to  serve  as  criteria 
both  for  pre-induction  predictors  and  for  In-service  predictors.  One 
Important  aspect  of  second-tour  performance  Is  the  leader  behavior  of 

4-39 


second-tour  soldiers  serving  as  NCO.  Leader  behavior  may  Involve  such 
activities  as  supervising,  training,  counseling,  planning,  decision  making, 
problem  solving,  etc.,  all  of  which  are  common  to  and  required  of  all  NCO, 
regardless  of  MOS. 

The  proper  approach  to  measuring  such  leader  behavior  will  be  the  Identi¬ 
fication  of  dimensions  of  leader  performance  common  to  all  NCO,  and  then 
the  development  of  measures  of  the  dimensions.  Fortunately,  It  will  not  be 
necessary  to  go  through  the  process  of  empirically  deriving  such  dimensions 
in  this  project.  As  part  of  a  recently  completed  project  in  USAREUR  for 
ARI,  HumRRO  personnel  have  already  Identified  the  dimensions  of  leader 
behavior  for  four  levels  of  NCO,  one  of  which  Is  squad  leader,  the  first 
NCO  level  (Hebei n,  Kaplan,  Olmstead,  &  Sharon,  1983).  Therefore,  as  a 
starting  point,  we  plan  to  evaluate  the  usefulness  of  the  dimensions 
derived  from  the  USAREUR  project  as  one  set  of  variables  for  measuring 
second-tour  performance.  Additional  dimensions  may  also  be  developed  as 
part  of  this  project. 

We  will  examine  carefully  the  dimensions  of  soldier  effectiveness  derived 
for  the  model  prepared  In  Subtask  2  to  evaluate  their  appropriateness  for 
inclusion  as  second-tour  performance  dimensions.  The  dimensions  from  the 
ARI -HumRRO  work  along  with  selected  dimensions  from  the  model  will  then  be 
presented  to  three  workshop  groups  of  15  NCO  and  field  grade  officers  In 
each  workshop.  The  dimension  set  will  be  refined  on  the  basis  of  their 
suggestions.  At  these  one-day  workshops,  to  be  held  during  January- 
February  1984  at  Forts  Hood,  Bragg,  and  Bennlng,  participants  will  also 
be  asked  to  provide  behavioral  statements  to  anchor  the  effective  and 


Ineffective  portions  of  each  dimension.  Subsequent  to  the  workshop,  our 
staff  will  refine  the  participants'  Input  and  submit  the  scales.  Including 
the  behavioral  anchors,  to  the  COR  and  other  ARI  scientists,  as  well  as  to 
designated  Army  personnel,  for  review  and  suggestions.  We  will  revise  the 
scales  based  on  these  reviews.  Thus,  behavior-based  rating  scales  will  be 
developed  for  second-tour  MCO  performance,  based- in  part  on  work  done  In 
USAREUR. 


Again,  the  number  of  participants  for  these  workshops  was  carefully  con¬ 
sidered  and  seems  appropriate.  We  will  already  have  behavioral  dimensions 
from  the  first-tour  behavioral  scale  development  effort,  as  well  as  input 
from  the  USAREUR  study.  Therefore,  45  participants  (one-half  the  number 
being  used  In  the  earlier  scale  development  work)  should  be  sufficient  to 
generate  the  needed  behavioral  examples. 

Developing  In-service  predictors  of  second-tour  performance.  This  effort 
will  rely  heavily  on  an  underlying  model  of  behavioral  consistency 
(Wernlmont  &  Campbell,  1968)— that  is,  the  best  predictor  of  future 
behavior  In  a  domain  is  past  behavior  In  that  same  domain.  Accordingly, 
first-tour  in-service  predictors  will  be  identified  or  developed  based  on  a 
conceptual  match  between  first  and  second-tour  criteria.  For  example,  In 
the  area  of  discipline-related  second-tour  criteria,  we  will  seek  first- 
tour  predictors  that  Index  the  same  kinds  of  behaviors. 

Specifically,  we  will  form  hypotheses  about  In-service  predictors/second- 
tour  performance  links,  drawing  on  the  rating  scales  developed  in  the  model 
of  soldier  effectiveness  and  the  composlte(s)  of  objective  Indices  as  the 


wain  In-service  predictors.  These  hypotheses  will  be  formally  stated  so 
that  we  can  later  check  on  the  validity  of  the  in-service  measures  for  pre¬ 
dicting  second-tour  soldier  effectiveness  (Task  1  activity).  Again,  behav¬ 
ior  consistency  notions  will  drive  development  of  these  hypotheses. 
Evidence  of  first-tour  leadership  (as  possibly  Indexed  by  ratings  on  cer¬ 
tain  dimensions  of. _the  _ model )  will  be.  used. _as_a  predictor.  of- second-tour 
leadership  effectiveness.  Getting  In  trouble  during  the  first  tour  should 
predict  discipline-related  criteria  In  the  second  tour,  and  so  on. 

Subtask  5:  Prepare  for  and  Conduct  Field  Tests  and  Revise  Instruments 

The  purposes  of  this  subtask  are  to:  (a)  try  out,  under  operational  condi¬ 
tions,  the  Army-wide  criterion  measures  developed  to  date;  and  (b)  revise 
these  measures  as  a  function  of  shortcomings  which  arise.  It  Is  our  Intent 
to  refine  Task  4  measures  through  an  Iterative  process  of  four  field  test 
cycles.  These  revisions  are  expected  to  Include  changes  In  both  content 
and  format.  The  fourth  field  test  will  be  conducted  In  FY1986,  after  the 
FY83/84  cohort  first-tour  data  have  been  collected  and  analyzed  (Subtask 
6). 

Each  field  test  cycle  consists  of  three  activities.  The  first  of  these  Is 
the  development  of  a  detailed  field  test  plan.  The  plan  will  contain  a 
rationale  for  collecting  the  data,  copies  of  the  data  collection  Instru¬ 
ments  themselves,  and  proposed  data  analyses.  The  second  and  third 
activities  are  the  field  test  Itself  followed  by  analysis  of  the  results. 

Field  test  data  collection.  The  design  of  any  one  field  test  can  be 
expected  to  vary  as  a  function  of  prior  field  tests.  ever,  all  field 
tests  will  be  related  to  the  solution  of  problems  which  follow  from 


4-42 


attempting  to  measure  the  performance  of  a  large  number  of  Individuals  as 
accurately  as  possible.  Several  questions,  some  more  critical  than  others, 
arise  with  regard  to  this  effort.  It  Is  our  intention  to  address  the  more 
critical  Issues  In  the  early  field  tests  so  that  any  major  problems  which 
develop  can  be  dealt  with  In  a  timely  manner. 

There  are  three  Issues  which  we  believe  are  most  Important  for  the  success 
of  this  subtask,  and  as  a  result  will  be  examined  In  the  first  field  test: 

(a)  the  psychometric  quality  and  accuracy  of  the  resulting  measurements; 

(b)  the  applicability  of  the  measures  across  disparate  Army  occupations; 
and  (c)  the  acceptability  of  the  instruments  to  the  Army.  The  first  field 
test  Is  designed  to  evaluate  the  extent  to  which  our  instruments  meet  these 
criteria.  Another  Issue  to  be  explored  in  a  later  field  test  will  be  an 
examination  of  the  differences  In  rating  distributions  that  may  be  attrib¬ 
utable  to  dropping  the  "for  research  purposes  only"  phrase  from  the  rating 
Instructions. 

Field  tests  are  designed  around  "rating  units."  Each  rating  unit  consists 
of  the  individual  soldier  to  be  evaluated,  two  Identifiable  peers,  and  two 
Identifiable  supervisors.  A  peer  Is  defined  as  an  Individual  assigned  to 
the  same  platoon  as  the  Individual.  A  supervisor  Is  defined  as  the  Indivi¬ 
dual's  platoon  sergeant  or  platoon  leader. 

In  order  to  collect  sufficient  appropriate  data  to  address  the  cr\-lca1 
Issues  raised  above,  the  following  data  will  be  collected  on  ea^h  ,-atlng 
unit:  .  (a)  minimum  of  two  peer  evaluations  using  prototype  scales,  (b) 
minimum  of  two  supervisory  evaluations  using  prototype  scales,  (c)  length 


4-43 


of  time  known  by  each  rater,  (d)  self-ratings  using  prototype  Instruments, 
and  (e)  objective/administrative  Indices  developed  In  Subtask  4.2. 

MOS  to  be  used  are  the  ones  that  will  be  tested  Initially  by  Task  5  person¬ 
nel  (13B  -  Cannon  Crewman,  71L  -  Administrative  Specialist,  95B  -  Military 
Police,  640  -  Motor  Transport  Operator).  __ Thus,  In  addition  to.  the  data 
collected  under  the  auspices  of  Task  4  staff,  other  criterion  data  should 
be  available  on  the  same  Individuals  from  Task  5  research.  This  Is 
extremely  Important,  for  It  allows  a  determination  of  the  convergent  and 
discriminant  validity  of  criterion  measures  across  these  two  tasks.  Ob¬ 
viously,  It  will  also  result  In  a  reduction  of  data  collection  costs  and 
disruption  of  ongoing  troop  activities. 

In  addition  to  the  above,  we  also  plan  to  collect  evaluative  Information 
from  the  supervisory  raters  concerning  their  reactions  to  the  rating  scales 
themselves.  This  evaluative  component  will  be  augmented  by  brief  oral 
Interviews  with  approximately  one-fourth  of  the  supervisory  raters,  again 
focusing  on  their  reactions  to  the  scales. 

Because  of  statistical  power  considerations  (Schmidt,  Hunter,  &  Urry,  1976) 
and  likely  missing  or  Incomplete  data,  an  Initial  sample  of  150  rating 
units  In  each  of  the  four  target  occupations  Is  projected.  If  we  assume  a 
span  of  control  of  5-10  subordinates  per  supervisor  and  the  required  two 
supervisors  per  Individual  soldier,  a  total  of  40  supervisors  and  150 
soldiers  representing  each  MOS  will  be  required  for  the  first  field  test. 
Each  of  these  Individuals  will  be  needed  for  a  one  day  period.  (During  this 
time  the  Task  5  job-specific  rating  scales  will  also  be  administered.) 


Access  will  also  be  needed  to  each  of  the  150  soldier's  201  Personnel 

File.  While  not  yet  projected,  sample  sites  required  for  each  of  the 

remaining  field  tests  are  expected  to  be  comparable. 

Analyzing  the  field  test  data.  These  analyses  will  include: 

1.  Examining  the  distributions  of  ratings  and  administrative  Index  score  - 
We  will  evaluate  the  distribution,  l.e.,  means,  standard  deviations, 

— skewness,  of  the  ratings  and  also  the  administrative  Index  scores. 
With  the  ratings,  for  example,  we  will  have  certain  expectations  about 
how  the  distributions  should  and  should  not  look,  and  the  actual  dis¬ 
tributions  will  be  compared  to  our  preconceptions.  Severe  deviations 
from  such  expected  distributions  would  cause  concern. 

2.  Evaluating  the  Interrater  agreement  of  ratings  -  Although  It  Is 
possible  to  obtain  high  Interrater  agreement  and  still  have  very 
Inaccurate  ratings,  good  agreement  between  raters  providing  Independent 
performance  judgments  Is  generally  thought  of  as  a  positive  sign  con¬ 
cerning  ratings.  We  will  compute  the  Interrater  agreement  both  within 
rating  source,  l.e.,  between  supervisors  and  between  peers,  and  across 
the  two  sources  to  help  assess  the  quality  of  rating  data. 

3.  Examining  the  dimensionality  of  ratings  and  the  administrative  data  - 
Employing  factor  analysis  methods,  we  will  evaluate  the  dimensionality, 
l.e.,  factor  structure,  of  the  rating  data  and  probably  the  rating  and 
administrative  Index  data  together.  It  may  be,  for  example,  that  a 
technical  competence  versus  Interpersonal  competence/adjustment  to  the 
Army  structure  will  emerge  from  factor  analytic  work.  Obtaining  two  or 


more  reliable  and  psychologically  meaningful  factors  from  criterion 
data  would  be  very  encouraging  for  the  validity  analyses.  This  Is 

because  very  different  kinds  of  predictor  measures  are  likely  to  be 

( 

successful  In  predicting  soldier  effectiveness  In  very  different  parts 
of  the  effectiveness  domain,  and  the  emergence  of  such  reliable  factors 
makes  It  possible  to  study  these  relatively  refined  predictor  -  crite¬ 
rion  linkages.  As  an  example,  we  found  that  a  mechanical  comprehension 
test  correlated  higher  with  ratings  of  technical  competence  (on  trans¬ 
mission  and  distribution  jobs)  than  It  did  with  ratings  of  Inter¬ 
personal  effectiveness  on  the  job  (Borman,  Mendel,  Lammleln,  &  Rosse, 
1981). 

4.  Evaluating  the  convergent  and  discriminant  validity  of  the  ratings  and 
the  ratings  and  administrative  Indices  together  -  We  favor  an  analysis 
strategy  suggested  by  Kavanagh,  MacKInney,  and  Wollns  (1971)  to  eval¬ 
uate  rating  data.  This  strategy  yields  estimates  of  convergence  across 
rating  source  (essentially  Interrater  agreement)  and  the  discriminant 
validity  of  ratings  (how  reliably  raters  evaluate  different  aspects  of 
soldier  effectiveness).  This  method  provides  good  Information  on  the 
quality  of  ratings  that  can  be  reasonably  compared  across  settings, 
e.g.,  units.  It  may  be  possible,  also,  to  evaluate  convergence  (or 
across-method  reliability)  of  the  administrative  Index  data  and  the 
rating  data,  using  this  data  analysis  strategy. 

5.  Assessing  accuracy  of  the  ratings  and  administrative  Index  scores  -  The 
Task  5  performance  measures  will  presumably  offer  high  fidelity  indices 
of  soldier  effectiveness  in  some  parts  of  the  Job.  Therefore,  for  the 


4-46 


the  Task  4  measures 


if 


*•: 


* 


s 


f 


* 


% 

fca 


k 


We  must  be  careful  here  because  we  would  not  expect  high  correlations 
between  Task  5  technical  performance  scores  and  rated  effectiveness  In 
the  area  of  organizational  commitment,  for  example.  On  the  other 
hand,  certain  technical  competence  dimensions  derived  In  Task  4  would 
be  expected  to  correlate  with  performance  scores  generated  In  Task  5. 
Thus,  where  the  Intended  content  of  Task  4  and  Task  5  measures  are  the 
same  or  very  similar,  correlations  between  the  two  sets  of  scores  may 
provide  a  meaningful  estimate  of  the  accuracy  of  Task  4  measures. 

6.  Finally,  we  will  consider  applying  general Izabl 11 ty  theory  (Cronbach, 
Gleser,  Nanda,  &  Rajaratnam,  1972)  to  analyze  the  ratings.  Using  this 
approach,  we  can  Identify  sources  of  variance  that  may  affect  the  dis¬ 
tributions  of  ratings  and  categorize  these  sources  into  desirable  or 
true  sources  of  variance  and  undesirable  or  error  sources.  An  attrac¬ 
tive  feature  of  this  approach  Is  that  It  specifies  analyses  of  variance 
that  could  provide  Information  on  two  Important  potential  sources  of 
error  In  ratings  -  MOS  and  location/unit  effects.  If  these  analyses 
show  that  substantial  variance  In  ratings  is  due  to  these  effects,  it 
suggests  that  raters  in  different  MOS  and  units  are  using  the  rating 
scales  very  differently,  and  therefore,  analyses  cutting  across 
MOS/unlts  are  suspect.  Care  taken  In  developing  and  administering  the 


4-47 


I 


i 

m  • 

.1 

. 

^i.iu-'ju-asre 


scales  will  serve  to  minimize  these  effects,  but  the  general Izablllty 
theory  approach  provides  a  possible  method  for  checking  on  this 
potential  problem. 

Still  another  very  Important  evaluation  criterion  for  the  field  test  Is 
"user  acceptance, "  which  relates  to  how  smoothly  the  data  collection 
proceeds.  Recall  that  we  plan  to  administer  questionnaires  to  raters 
and  Interview  them  after  they  complete  their  ratings.  We  will  evaluate 
the  questionnaire  and  Interview  responses,  with  an  eye  toward  revising  our 
procedures  If  It  seems  warranted.  Also,  we  will,  of  course,  carefully 
monitor  the  data-gatherlng  procedures,  both  ratings  and  objective  measures, 
to  evaluate  the  feasibility  of  these  procedures  for  the  larger-scale  data 
collection  efforts. 

Pilot  test  of  special  Combat  Performance  Predictor  Scales.  As  mentioned, 
we  plan  to  explore  the  feasibility  of  developing  performance  prediction 
dimensions  and  rating  scales  relevant  to  a  combat/mobll izatlon  context. 
These  dimensions  will  reflect  across-MOS  performance  requirements  In  a 
combat  situation.  In  a  pilot  field  test  of  these  rating  scales,  we  will 
ask  50  rating  units  (20  supervisors  and  20  peers  of  the  50  target  ratees) 
In  each  of  the  four  MOS  to  be  Included  In  the  Initial  field  test  to  make 
ratings  of  soldier  effectiveness  using  both  the  combat  performance  predic¬ 
tion  scales  an£  the  rating  scales  developed  to  evaluate  non-combat  Army¬ 
wide  soldier  effectiveness.  The  primary  goal  will  be  to  assess  raters' 
abilities  to  differentiate  between  non-combat  performance  and  predicted 
combat  performance.  One  Index  of  this  will  be  a  comparison  between  inter¬ 
rater  reliability  for  the  prediction  scale  evaluations  with  the  level  of 


4-48 


?!■ 


! 


IL* 


I,' 

»• 

,  ■  ^  i 


3 


s 

u 


fa. 


y£§  JOB? 


IV 


Si; 


reliability  attained  on  the  non-combat  rating  scales.  Additionally,  we 
will  Interview  one-fourth  of  these  raters  (20  supervisors  and  20  peers)  to 
obtain  their  Impressions  of  the  predicted  performance  rating  task.  They 
will  be  asked  about  how  they  made  their  prediction  evaluations,  how  confi¬ 
dent  they  felt  about  the  predictions,  and  how  the  rating  scales  and/or 
rating  procedure  could  be  Improved. 

In  sum,  our "staff  will  evaluate  the  data  collection  procedures  and  perform 
data  analyses  In  an  effort  to  Identify  weaknesses  In  the  measurement 
system.  The  total  evaluation  effort  will  suggest  which  Instruments  and 
procedures  must  be  Improved  upon;  necessary  revisions  will  be  made,  and 
revised  measures  will  be  field  tested. 


i 

•  ,* 

t 


x. 


Prepare  report  on  field  tests  I,  II  and  III.  Finally,  within  this  subtask, 
a  report  on  the  first  three  field  tests  will  be  prepared.  The  report  will 
summarize  the  findings  from  data  analyses  and  describe  the  revisions  made 
to  measures  and  procedures  as  a  result  of  each  of  the  three  field  tests. 
Additionally,  the  report  will  outline  the  revisions.  If  needed,  to  proce¬ 
dures  and  measures  that  will  be  evaluated  In  the  fourth  and  final  field 
test  In  FY  1986. 


,V 

V 

a 


s 


h 


y 


Subtask  6:  FY83/84  Cohort  First-Tour  Data  Collection  and  Analyses 

This  subtask  consists  of  four  major  activities:  (a)  preparation  of  a 
draft  and  final  plan  for  the  FY83/84  cohort  data  collection,  (b)  adminis¬ 
tration  of  the  rating  and  administrative  measures  to  the  sample  of  cohort 
members  In  the  19  MOS,  (c)  analysis  of  data  from  this  data  collection,  and 
(d)  preparation  of  a  draft  and  final  report  on  these  analyses. 


4-49 


"« t' 


.  *  ."V  .%  -N  .VlM’.  l\;  V-1 


W.'.TW.O  V.V 


Oraft  and  final  plan  for  data  collection.  The  Task  4  staff  will  prepare 
a  draft  plan  for  data  collection  from  the  FY83/84  cohort  and  submit  the 
plan  to  the  COR  for  review.  The  plan  will  discuss  rating  scale  administra¬ 
tion  procedures  and  collection  of  objective  administrative  Index  data. 
After  review  by  the  COR  and  others  he  designates,  we  will  revise  the  draft 
plan  and  submit  a  final  version. 

Administration  of  the  rating  and  administrative  measures.  The.  HumRRO-PDRI 
staff  members  will  monitor  collection  of  the  rating  and  administrative 
data.  Although  Task  5  researchers  will  for  the  most  part  be  performing  the 
actual  data  gathering  work,  Task  4  researchers  will  be  on  hand  Initially  to 
ensure  that  the  administration  procedures  are  being  handled  properly.  We 
are  especially  concerned  about  rating  scale  administration.  It  Is  critical 
that  careful  attention  be  directed  to  several  aspects  of  this  administra¬ 
tion  effort.  Selecting  the  proper  supervisor  and  peer  raters  (according  to 
their  knowledge  of  ratee  performance),  introducing  the  study  In  a  profes¬ 
sional  but  motivating  manner  to  encourage  complete  cooperation  In  the 
rating  sessions,  and  training  raters  to  Increase  the  likelihood  of  obtain¬ 
ing  accurate  evaluations  are  all  very  Important.  These  and  other  details 
of  the  scale  administration  procedures  will  be  closely  monitored  by  our 
staff. 

Subject  to  modification  based  on  field  test  results  (Subtask  5),  we  will 
ask  two  supervisors  (first  sergeant  and  platoon  leader)  and  two  peers  of 
the  target  ratees  to  complete  the  performance  rating  forms  on  soldiers  they 
are  to  evaluate.  Rating  assignments  will  be  determined  before  the  actual 
scale  administration  sessions  In  discussions  between  researchers  and  the 


4-50 


first  sergeants  of  the  ratees.  The  rating  sessions  will  be  conducted  and 
the  data  processed  and  put  Into  the  LRDB  by  Task  1  personnel  along  with  the 
administrative  Indices  data. 

For  attrition  and  reenllstment  categories  we  will  obtain  these  data  on 
those  soldiers  In  the  FY 83/84  cohort  sample  who  have  e1ther_separated  or 
reenllsted  and,  again,  get  the  Information  processed  and  entered  on  the 
LROB. _  _  . . . . .  .  . . 


Data  analyses.  Task  1  staff  Is  responsible  for  validity  analyses,  but 
Task  4  researchers  will  perform  analyses  to  determine  the  quality  of  the 
data  and  the  Instruments.  These  analyses  will  Include  the  same  ones 
discussed  for  Subtask  5:  (a)  evaluating  the  distributions  of  ratings  and 

administrative  Index  scores;  (b)  evaluating  the  Interrater  agreement  of 
ratings;  (c)  determining  the  dimensionality  of  ratings;  (d)  evaluating  the 
convergent  and  discriminant  validity  of  the  ratings;  (e)  assessing,  as 
feasible  and  appropriate,  the  accuracy  of  the  ratings;  and  (f)  evaluating 
MOS  and  unit  effects  within  the  general Izabillty  theory  framework. 

Prepare  draft  and  final  technical  report  on  data  analyses.  After  com¬ 
pleting  data  analysis  work  as  described  above,  we  will  prepare  a  draft 
report  on  results  of  the  analyses.  The  report  will  be  submitted  to  the 
COR  for  review,  and  we  will  make  the  necessary  revisions  In  a  final 
technical  report  on  these  analyses. 


Subtisk  7:  Revise  Instruments 


We  will  review  and  as  needed,  revise  both  the  behavioral  rating  scales  and 
the  objective  Indices.  Specifically,  based  on  performance  rating  data  from 
the  FY83/84  cohort  study  (first  tour),  and  the  data  from  field  testings  of 
the  various  Instruments,  revisions  to  both  procedures  for  collecting  objec¬ 
tive  Indices  and  ratings  of.  soldier  effectiveness  will -be  made.  Having 
followed  an  Iterative  process  throughout  the  four  field  tests,  the 
revisions  to  be  made  at  this  time  are  anticipated  to  be  minor  refinements 
rather  than  major  revisions  since  by  this  point  the  Instruments  will  have 
already  undergone  a  number  of  revisions. 

To  accomplish  the  objective  of  this  subtask.  It  Is  the  Intention  of  our 
staff  to  coordinate,  through  the  COR,  for  the  review  of  these  instru¬ 
ments  by  high  level  Army  officials,  and  then  to  Incorporate  changes,  as 
required.  Simply  stated,  this  subtask  can  be  viewed  as  one  In  which  the 
final  "polishing"  of  instruments  will  be  accomplished  prior  to  administra¬ 
tion  during  the  FY86/87  first-tour  and  FY83/84  second-tour  data  collec¬ 
tion. 

Subtask  8:  FY83/84  Cohort  Second-Tour  and  FY86/87  Cohort  First-Tour  Data 
Collection  and  Analyses 

This  subtask  consists  of  three  activities:  (a)  preparation  of  a  draft  and 
final  plan  for  the  FY83/84  cohort  second-tour  and  FY86/87  first-tour  data 
collection;  (b)  administration  of  the  rating  and  administrative  measures 
to  the  two  samples  of  cohort  members;  and  (c)  analysis  of  data  from  these 
data  collections. 


4-52 


Draft  and  final  plan  for  data  collection.  The  first  step  In  this  activity 
will  be  the  preparation  of  a  Troop  Support  Request  for  (a)  a  sample  of 
the  FY86/87  cohort;  and  (b)  those  personnel  who  were  measured  In  the 
FY83/84  cohort  study  (Subtask  6)  and  have  remained  In  the  service  for  a 
second  tour.  Of  course,  attrition  will  reduce  considerably  the  original 
sample  by  the-  time  this  second-tour  measurement  Is  accomplished.  —We 
estimate  that  only  about  10  percent  of  the  original  sample  will  be 
available  for  second-tour  data  collection  at  the  sites  visited. 

Additionally,  a  draft  plan  for  data  collection  from  the  FY83/84  and 
FY86/87  cohort  will  be  prepared  and  submitted  to  the  COR  for  review. 
As  In  Subtask  6,  the  plan  will  discuss  rating  scale  administration 
procedures  and  collection  of  objective  administrative  Index  data.  After 
review  by  the  COR  we  will  revise  the  draft  plan  and  submit  a  final  plan 
for  data  collection. 

Administration  of  the  rating  and  administrative  measures.  In  this 

activity,  FY83/84  (second-tour)  and  FY86/87  (first-tour)  data  will  be 
collected  concurrently  by  the  cohort  data  collection  team,  l.e..  Task  5 
researchers,  during  months  69-72  of  the  effort.  The  data  collection  will 
proceed  In  the  same  manner  as  has  previously  been  described  In  Subtask  6. 
Data  on  the  same  objective  measures  used  for  the  FY83/84  cohort  first-tour 
study,  will  again  be  collected  for  this  second-tour  cohort,  as  well  as  for 
the  FY86/87  first-tour  cohort.  Additionally,  the  behavior-based  rating 
scales  developed  In  Subtask  4  to  tap  second-tour  soldier  effectiveness  will 
be  administered  to  superiors  and  peers  of  the  second-tour  sample  members. 


who  will  complete  self-ratings.  Finally,  supervisory,  peer,  and  self 
ratings  will  be  collected  on  the  FY86/87  cohort  In  the  same  fashion  as  they 
were  collected  on  the  FY83/84  first-tour  cohort  (see  Subtask  6  for  a 
description  of  data  collection  procedures). 

Data  analyses.  The  main,  purpose  for  gathering  these  data  is  of  .course- 
to  evaluate  the  validity  of  current  and  newly  developed  pre-induction 
predictors  and  the  In-service  predictors  developed  In  Subtask  4.  Thus,  we 
will  submit  the  data  collected  here  to  Task  1  staff  and  to  the  LRDB  for  the 
validity  analyses.  However,  as  In  Subtask  6,  the  Task  4  researchers  will 
perform  analyses  to  determine  the  quality  of  the  data  collected.  As 
before,  these  analyses  will  Include  but  not  necessarily  be  limited  to: 

(a)  examining  the  distributions  of  ratings  and  administrative  Index  scores, 

(b)  evaluating  the  Interrater  agreement  of  ratings,  (c)  examining  the 
dimensionality  of  the  ratings  and  of  the  administrative  data,  and  (d) 
evaluating  the  convergent  and  discriminant  validity  of  the  ratings  and  the 
ratings  and  administrative  Indices  together. 

In  addition,  as  in  Subtask  6,  we  will  once  again  have  the  opportunity  to 
assess  the  accuracy  of  some  of  the  ratings  and  administrative  Index  scores 
by  correlating  Task  4  criterion  scores  with  selected  job  performance  scores 
from  the  Task  5  research.  However,  limitations  on  Interpreting  these 
correlations  will  be  the  same  here  as  were  discussed  previously  In  relation 
to  first-tour  criteria.  Likewise,  evaluations  of  MOS  and  unit  effects  may 
proceed  within  a  general izablllty  theory  framework. 


Subtask  9;  Obtain  Scaled  Utilities 


Much  of  the  work  of  Project  A  focuses  on  the  development  of  new  and 
Improved  measures  of  performance.  These  measures  are  Intended  to  serve  as 
criteria  for  use  In  evaluating  alternative  selection  and  classification 
decisions.  The  goal  of  this  subtask  Is  to  provide  the  basis  for  transla¬ 
ting  the  measures  of  different  aspects  of  a  soldier's  performance  Into  a 
single,  best  Indicator  of  the  soldier's  net  worth  to  the  Army,  relative  to 
other  soldiers  performing  at  different  levels,  perhaps  even  at  different 
tasks. 

The  primary  need  for  such  a  measure  of  each  soldier's  relative  "utility"  vs 
cost  to  the  Army  is  that  a  single  criterion  is  needed  In  implementing 
CPAS.  The  decision  to  accept  an  applicant  into  the  Army  must  be  based  on 
an  estimate  of  the  applicant's  potential  worth  to  the  Army  vs  estimated 
cost  in  comparison  to  the  potential  net  worth  of  other  applicants.  The 
decision  to  classify  an  applicant  Into  one  MOS  rather  than  some  other  must 
similarly  be  based  on  the  applicant's  net  worth  to  the  Army  If  assigned  to 
each  of  the  MOS  In  comparison  to  other  applicants  who  might  be  assigned  to 
the  various  MOS. 

In  order  to  compare  an  applicant's  relative  worth  to  the  Army  In  alterna¬ 
tive  MOS,  It  is  clear  that  our  estimate  of  worth  to  the  Army  must  be  on  a 
single  common  scale  across  the  different  MOS. 

It  is  further  desirable  that  we  be  able  to  express  the  common  utility  scale 
in  a  "dollar"  metric.  If  each  soldier's  predicted  worth  can  be  expressed 
in  monetary  units.  It  will  greatly  facilitate  cost-benefit  analyses  of 
alternative  testing  and  classification  procedures.  The  cost  of  additional 


testing  time  or  of  the  operation  of  CPAS  and  the  personnel  replacement  cost 
of  training,  support  and  maintenance  can  be  traded  off  against  the 
Increases  In  the  dollar  worth  of  the  selected  applicants. 


Finally,  In  developing  an  overall  scale  of  utility,  it  will  also  be  Impor¬ 
tant  to  state  the  context  In  whlch-utlMty  Is -to  be  assessed.  The  Impor¬ 
tance  to  the  Army  of  an  Increment  In  some  type  of  performance  may  be  quite 
different  during  wartime  than  It  Is  during  peacetime,  or  at  different 
levels  of  mobilization.  We  will  examine  this  possibility  by  collecting 
utility  data  under  more  than  one  specified  context  and  comparing  the 
results.  Where  differences  are  found,  results  based  on  the  alternative 
scalings  must  be  passed  on  to  Project  B  so  that  CPAS  can  be  adjusted  to 
optimize  performance  under  the  different  contexts. 

While  the  needs  for  a  common  scale  of  utility  are  clear,  the  potential 
difficulties  In  the  development  of  such  a  scale  are  equally  clear.  We  are 
attempting  to  create  a  common  scale  of  utility  across  a  very  wide  range  of 
cl  :umstances  based  on  newly  developed  performance  measures.  Most  examples 
of  successful  utility  scaling  that  approach  this  order  of  magnitude  have 
been  based  heavily  on  relatively  "hard"  economic  data  rather  than  with  the 
more  approximate  types  of  performance  Indicators  that  can  be  achieved  In 
this  project.  If  we  are  even  approximately  successful  In  achieving  a 
common  dol lar-valued  utility  metric,  we  will  have  made  a  significant 
advance  In  the  application  of  utility  scaling  techniques. 


*5 


a 


8 

tf. 


tc 


The  various  problems  In  the  development  of  the  needed  utility  metric  are 
described  below.  There  are  alternative  models  or  procedures  that  suggest 
alternative  means  of  handling  the  potential  problems  In  each  case. 


4-56 


Wherever  possible,  we  will  seek  to  try  out  as  many  alternatives  as  are 
feasible  In  our  testing  so  that  we  may  obtain  some  Indication  of  the  level 
of  congruence  of  the  results  of  these  alternative  approaches.  Where  there 
Is  a  . <gh  level  of  agreement  between  alternative  approaches.  It  may  be 
taken  as  an  Indicator  of  the  validity  of  the  approaches  considered.  Where 
such- congruence  1$  lacking,  we  must  proceed  to  -investigate- potential 
sources  of  this  noncongruence  before  deciding  which  approach(es)  to  use. 


Development  of  performance  construct  measures.  The  assessment  of  each 
soldiers's  worth  to  the  Arnjy  must  necessarily  be  based  on  the  performance 
measures  we  have  Identified  or  developed  In  Tasks  4  and  5.  The  nature  of 
these  measures  varies  from  Indirect  Indicators  of  constructs  such  as  “obeys 
orders"  to  directly  observed  measures  of  tne  performance  of  relevant 
tasks.  These  measures  represent  only  a  sample  of  the  possible  measures, 
e.g.,  only  a  sample  of  the  tasks  performed  within  an  MOS.  However,  It  is 
Important  that  this  sample  adequately  represent  the  entire  domain  of  per¬ 
formance  dimensions  since  they  are  the  only  measures  we  have  from  which  to 
estimate  each  soldier's  utility. 

fi  problem  In  the  development  of  an  overall  utility  scale  Is  that  the  number 
of  Individual  measures  will  be  quite  large.  Some  reduction  In  this  number 
Is  necessary  to  make  the  task  of  Identifying  the  relative  Importance  of 
each  measure  more  manageable.  By  this  we  do  not  mean  that  we  will  seek  to 
Identify  some  abstract  set  of  orthogonal  dimensions  through  factor  analytic 
techniques.  Rather,  we  will  build  on  the  multitrait-multimethod  analyses 
already  planned  for  our  assessment  of  construct  validity.  We  will  combine 
the  multiple  measures  (e.g.,  supervisor  and  peer  ratings,  job  knowledge  and 


hands-on  performance  measures)  of  a  single  performance  construct  Into  a 
single  best  (most  reliable)  composite  based  on  the  LISREL  V  models  used  In 
construct  validity  analysis.  In  th>,  u  \-,-ussJon,  we  will  call  the  result¬ 
ing  composites  performance  construe  -■v'.sur.s  as  distinct  from  observed 
measures,  although  If  a  construct  Is  assessed  by  only  one  observed  measure, 
the  two  will  be  Identical.  _  _ _  .  _  -  - 

For  the  purposes  of  obtaining  utility  judgments.  It  will  be  desirable  to 
divide  the  performance  construct  measures  into  distinct  levels.  The  number 
of  levels  may  range  from  two,  In  the  case  of  GO/NO-GO  task  measures,  to  as 
many  as  five  levels  for  measures  such  as  five-point  rating  scales.  Where 
several  observed  measures  have  been  combined,  It  will  be  Important  to 
label  the  levels  of  the  composite  In  terms  of  the  more  objective  component 
measures,  e.g.,  at  the  "superior"  level,  85  percent  can  perform  task  X 
correctly  on  the  first  try.  Once  the  Individual  performance  constructs  and 
the  distinct  levels  of  performance  within  each  construct  have  been  defined 
for  each  MGS,  we  are  ready  to  proceed  with  the  assignment  of  utilities. 

Development  of  utilities  within  each  MOS.  The  exact  procedure  to  be  used 
In  obtaining  utility  estimates  for  performance  levels  within  each  MOS  will 
be  developed  during  the  first  two  years  of  this  effort.  Because  of  the 
unprecedented  scope  of  the  present  effort.  It  will  be  Important  to  conduct 
tryout  studies  to  test  the  feasibility  and  the  validity  of  alternative 
approaches  In  the  present  context.  Since  the  actual  data  collection  for 
utility  scaling  does  not  take  place  until  the  latter  half  of  1985,  there 
will  be  sufficient  time  for  such  tryouts. 


In  general  there  are  two  approaches  to  the  development  of  utility  scales: 
(a)  to  use  naturally  occurring  Indicators  of  the  relative  Importance  of 
different  levels  of  performance  on  the  different  performance  constructs, 
and  (b)  to  gather  judgmental  data  from  qualified  respondents.  While  the 
former  approach  may  provide  some  Insights,  we  feel  that  It  Is  essential  to 
.gather  judgmental,  preference  data  from  high-ranking  military  officers. 
Only  In  this  way  can  ARI  be  assured  that  the  selection  and  classification 
system  will  result  In  personnel  assignments  that  are  In  accordance  with  the 
performance  expectations  of  senior  military  leaders.  We  will  look  for  In¬ 
direct  Indicators  such  as  differential  rates  of  advancement  for  soldiers 
performing  at  different  levels  on  each  construct,  but  the  whole  motivation 
for  the  effort  to  develop  performance  measures  stems  from  the  absence  of 
adequate  operational  Indicators  of  utility. 

There  are  a  number  of  alternative  approaches  for  eliciting  utility  scaling 
data  from  expert  judges  (see  Luce  &  Suppes,  1965;  Ralffa,  1970).  These 
range  from  approaches  Identified  with  Multi -Attribute  Utility  Theory  (see 
Edwards,  1977;  Kenney  &  Ralffa,  1976)  which  obtain  Judgments  of  the  rela¬ 
tive  Importance  of  different  attributes  along  which  the  stimuli  are  rated, 
to  approaches  based  on  a  conjoint  measurement  perspective  (Luce  &  Tukey, 
1964)  which  assess  the  Importance  of  different  attributes  through  compar¬ 
isons  of  Individuals  with  different  values  on  the  different  attribute 
(performance  construct)  dimensions.  In  the  former  case,  concern  is  with 
whether  judges  are  able  to  give  valid  ratings  of  the  attribute  dimensions 
directly.  In  addition.  It  may  be  necessary  to  perform  a  separate  scaling 
of  the  utility  of  each  level  of  each  performance  construct  first  before  a 
linear  composite  function  Is  appropriate.  With  such  Independent  scalings 


of  the  attribute  dimensions,  ft  would  not  be  possible  to  take  Interactions 

'  7  1  r  ;•  -  -  .  ’  -  .  V  "  •-  '• 

among  the  attributes  Into  account ►  (The  difference  between  good  and  medi¬ 
ocre  performance  on  one  dimension  may  depend  on  the  level  of  some  other 
attribute.),  In  the  cate  of  other  approaches,  there  is  still  a  concem'wlth 
(clje  validity  of  the/lndlvldual  judgments,  but  the  primary  concern  is  gener¬ 
al  Ty  with  the  dumber  of  Individual  Judgments  that  npy  be  required  to  yield  , 
stable  pal*an4-br  estimates. 

1  ;  •  '  -  /  :  .  '  J  >  .  ■  ;  • 

I  \  ■  « 

V  ‘  ■  'I 

)  •  '  . 

\  .  .  .  v  .  '  x 

.Regardless  of  the  exact  procedure  that  Is  found  to  be  most  effective,  a 
vlt^l  component  in  the  development  of  utility  scales  within  each  MOS  will 
be  the  description  of  the  stimuli  to  be  ranked  or  rated.  It  Is  critical 
that,  whether  individuals  or  constructs  are  being  described,  the  descrip¬ 
tions  !be  behavioral ly  anchored  to  the  maximum  extent  possible,  father  than 
iaylng  that, a  soldier  to  be  rated  had  “moderate"  discipline  problems,  for 
example,  we  would  say  that  this  soldier  had  two  disciplinary  reports  in  the 
past  year.  Rather  than  saying  "performs  task  X  well",  we  prefer  statements 
such  as  "85  percent  of  the  soldiers  at  this  level  can  perform  task  X  cor¬ 
rectly  on  the  first  try"  (or  "within  15  minutes"). 

i 

In  collecting  wlthln-MOS  utility  scaling  data,  we  will  first  focus  on  the 
nine  MOS  for  which  hands-on  performance  data  will  be  available.  For  each 
of  these  nine  MOS,  we  will  develop  procedures  and  stimuli  and  try  them  out, 
first  with  project  and  ARI  staff,  and  then  with  three  SHE.  The  procedures 
will  be  tried  twice  In  counterbalanced  order:  (a)  with  performance  con¬ 
structs  defined  In  terms  of  all  the  measuros  available  for  that  MOS,  and 
(d)  with  the  constructs  defined  in  terms  of  only  the  Army-wide  measures 


available  for  the  MOS.  Insofar  as  possible,  microcomputers  will  be  used  to 
control  the  presentation  of  stimuli  and  the  collection  of  responses. 


Following  the  procedures  outlined  In  the  next  section,  we  will  then  rescale 
the  alternate  sets  of  composite  utilities  for  the  nine  MOS  onto  a  common 
scale,  again  using  small  groups  of  raters  performing  the  task  In  counter 
balanced  order.  Based  upon  the  scale  values  obtained,  we  will  compute  two 
sets  of  composite  utility  scores  for  the  soldiers  testedl  In  each  MOS.  If 
(as  we  suspect)  these  utilities  correlate  quite  highly  and  have  equivalent 
means  and  variances  within  MOS  and  produce  the  same  order  of  statistical 
differences  across  MOS,  It  will  then  be  reasonable  to  use  the  performance 
constructs  defined  In  terms  of  only  the  Army-wide  measures  when  we  rescale 
the  utility  scales  for  all  sampled  MOS  onto  a  common  scale.  If  the  two 
sets  of  composite  utility  scores  for  the  tested  soldiers  are  quite 
different.  It  will  mean  that  the  inclusion  of  the  hands-on  performance 
levels  In  the  definitions  of  the  constructs  substantially  Influenced  the 
raters.  Utilities  derived  exclusively  from  Army-wide  measures  for  the 
nine  MOS  for  which  we  do  not  have  hands-on  measures  will  then  have  to  be 
Interpreted  cautiously  If  some  way  to  adjust  the  utilities  through 
statistical  or  judgmental  means  Is  not  found. 

When  the  exploratory  research  with  the  first  nine  MOS  Is  completed,  we  will 
turn  to  the  ten  MOS  for  which  only  Army-wide  measures  are  available.  The 


llnltially,  data  from  the  field  trials  of  the  performance  measures,  or 
even  dummy  data,  will  be  used.  When  the  data  from  the  FY83/84  cohort 
become  available,  these  analyses  will  be  repeated. 


process  yielding  the  most  equivalence  between  the  two  sets  of  utilities  In 
the  first  nine  HOS  will  be  repeated  using  the  available  measures.  When  we 
are  satisfied  with  our  approach  we  will  obtain  within  MOS  utility  scales 
for  all  19  MOS  using  larger  groups  of  SME. 

Rescaling  the  utility  of  each  MOS  onto  a  common  scale.  While  It  may  be 
possible  to  develop  a  scale  that  Is  constant  across  different  MOS  at  the 
same  time  that  the  within  scales  are  developed,  we  do  not  expect  to  do  so. 
The  division  of  the  problem  Into  separate  steps  for  within  and  across  MOS 
utilities  allows  us  to  examine  the  results  and  assumptions  of  the  more 
detailed  within  MOS  scaling  before  proceeding  to  the  next  stage.  It  is 
almost  surely  the  case  that  the  number  of  judgments  required  to  perform  the 
entire  scaling  task  In  one  step  would  be  prohibitive.  In  addition,  we 
think  It  Is  likely  that  the  wlthln-MOS  utility  scaling  would  benefit  from 
the  advice  of  raters  with  detailed  knowledge  of  the  MOS  being  rated,  while 
the  across-MOS  utility  scaling  requires  judgments  from  officers  at  a  rela¬ 
tively  higher  level  of  command. 

Because  of  the  complexity  of  the  evaluation.  It  would  be  unwise  to  expect 
raters  to  be  able  to  holistically  order  the  Importance  of  performance  In¬ 
crements  across  the  different  MOS  In  the  absence  of  more  specific  criteria 
for  evaluation.  Our  approach,  therefore,  will  be  to  use  an  Iterative  pro¬ 
cedure,  Involving  consultation  with  relevant  literature  and  expert  opinion, 
to  distill  a  good  set  of  general  evaluative  criteria  (Army  goals)  that  are 
broad  In  scope  yet  meaningful,  practical,  and  internally  consistent.  Some 
of  the  factors  and  issues  that  will  be  considered  In  formulating  an  attri¬ 
bute  set  for  evaluating  MOS  utility  Include:  (a)  Impact  on  force  readiness. 


4-62 


';.VAT.*^_VV  •£*<'-- 


\»  V  ">  V  ■>  'J  '  V  >v>  \  '  V"  .  •  i  -  , 


i"'  .>  L>  : 


on  military  survival;  (b)  centrality  of  W)$  activity;  (c)  contribution  to 
psychological  and  physical  well-being  of  forces,  and  to  civilian  relation¬ 
ships  and  support;  (d)  effects  on  enemy  performance;  etc.  These  types  of 
factors  and  numerous  others  will  be  Itemized,  and  then  configured  Into  a 
coherent  set  of  evaluative  criteria. 


*  '*  7  »5. 


The  criteria  that  will  be  used  to  assist  raters  In  judging  the  relative 
Importance  of  performance  Increments  In  different  MOS  will  be  determined  by 
conducting  half-day  workshops  with  two  groups  of  10-15  more  senior  military 
officers.  The  participants  In  each  workshop  will  carry  out  two  tasks. 
First,  they  will  use  critical  Incident  methodology  to  generate  specific 
examples  of  when  performance  In  a  particular  MOS  was  judged  to  be  parti¬ 
cularly  effective  or  Ineffective.  They  will  also  be  asked  to  list  the 
reasons  why  the  particular  episode  was  judged  to  be  effective  or  Ineffec¬ 
tive.  Second,  the  participants  will  be  presented  with  a  series  of  choices 
to  be  made  under  various  contexts  such  as  the  following: 


Suppose  during  wartime  you  have  ten  Wheel  Vehicle  Repairers 
(63W)  and  ten  Track  Vehicle  Repairers  (63Y).  Five  of  the 
63W  and  five  of  the  63Y  are  highly  competent  and  the  others 
make  frequent  mistakes.  If  you  could  replace  two  of  the 
less  able  Repairers  for  the  more  able  ones,  would  you  choose 
to  do  this  for  the  63W  or  the  63Y? 

(If  the  Judge  favors  the  63W:)  How  many  63Y  replacements 
would  be  of  equal  value  to  the  replacement  of  two  63W? 


After  his  or  her  choices  are  made,  the  participant  will  be  asked  to  verbal¬ 
ize,  or  write  down,  the  reasons  for  the  choice. 


4-63 


.>■  v%.  k'VVN" 

•  .  *  -  W  «.  «.  .  jfV-W 


On  the  basis  of  the  content  of  the  reasons  given  for  the  choice  of  critical 
Incidents  and  the  reasons  for  the  hypothetical  choices,  the  research  staff 
will  derive  the  most  parsimonious  and  most  frequently  cited  components  of 
Importance  for  each  context  separately  and  combined.  The  content  analysis 
will  be  done  separately  for  the  two  workshops,  and  only  components,  1,e., 
reasons,  upon  which  the  two  groups  agree  will  be  used. 


r\ 


£ 


At  the  level  of  across  MOS  comparisons,  It  will  be  desirable  to  Identify 
four  or  five  levels  of  performance  on  the  wlthln-MOS  utility  scales  and  to 
again  anchor  these  levels  In  descriptions  of  the  constructs  (maybe  half  a 
dozen  or  so)  most  relevant  to  the  utility  scale  for  that  MOS.  Thus  a 
"superior  Infantryman"  might  be  defined  In  terms  such  as: 

(1)  95  percent  of  the  Infantrymen  at  this  level  can  fire  a 
rifle  within  some  specified  level  of  accuracy, 

(2)  90  percent  of  the  Infantrymen  at  this  level  can  clean 
and  reload  rifle  correctly  within  x  minutes,  and 

(3)  fewer  than  5  percent  of  the  Infantrymen  at  this  level 
have  had  any  discipline  problems  at  all  during  the  past 
year. 


P. 

•-> 

£ 


§ 


t'- 

t-N 


We  expect  something  on  the  order  of  five  to  ten  statements  to  "define"  each 

performance  level  for  the  MOS.  We  will  Investigate  the  feasibility  and 

validity  of  alternative  procedures  for  eliciting  the  required  ratings  from  *:■ 

expert  Judges.  One  of  the  methods  that  will  be  explored  is  to  ask  first 

for  ratings  of  the  relative  Importance  of  the  different  evaluation  criteria 

to  the  overall  mission  of  the  Army.  The  judges  would  then  be  presented  ^ 

with  descriptions  of  different  performance  levels  In  different  MOS  and 

asked  to  rate  the  value  of  various  performance  Increments  according  to  each  y- 


4-64 


criterion.  Then  overall  evaluations  of  performance  Increments  In  the 
different  MOS  can  be  computed  and  displayed  to  the  Judges.  The  Judges 
would  be  allowed  to  modify  their  ratings  on  the  basis  of  this  feedback. 

For  both  the  wlthln-MOS  and  the  across-MOS  utility  scaling,  it  will  be 
Important  to  elicit  a  sufficient  number  of  judgments  to  significantly 
overdetermine  the  appropriate  scaling.  In  this  way  It  will  be  possible  to 
create  Internal  evidence  of  the  consistency  with  which  judgments  have  been 
made,  both  within  and  across  raters.  As  “In  the  case  of  the ~  within  MOS 
scaling,  we  will  proceed  from  tryouts  with  small  groups  of  raters  to  larger 
groups  In  order  to  obtain  more  stable  and  valid  values  using  the  procedures 
found  to  be  most  appropriate. 

Assigning  dollar  values  to  performance  utility  levels.  As  stated  above.  It 
Is  highly  desirable  that  a  translation  of  the  utility  scale  Into  dollars  be 
derived.  This  step  Is  essential  In  comparing  the  benefits  (In  the  utility 
of  Increased  performances)  of  CPAS  to  the  related  costs.  One  such 
comparison  of  particular  Interest  Is  between  the  costs  and  benefits  of 
various  Increases  In  testing  time.  Fortunately,  It  will  not  be  necessary 
to  obtain  direct  dollar  values  for  all  MOS/performance  level  combinations 
If  the  performance  utilities  have  been  reliably  and  validly  scaled.  A 
sample  of  MOS/performance  levels  can  be  monetized  and  the  dollar  values  of 
the  utilities  of  the  remaining  combinations  obtained  through  derived 
utlllty/dollar  translation  curves. 

The  problems  of  achieving  the  desired  translation,  however,  are  many  and 
large.  Short  of  detailed  simulations  whose  cost  would  be  problM  M ve,  we 


4-65 


are  not  optimistic  about  achieving  more  than  a  very  approximate  transla¬ 
tion.  We  will,  however,  allocate  time  for  further  reviews  of  relevant  lit¬ 
erature  and  for  discussions  with  others  who  are  trying  to  derive  similar  or 
related  estimates.  At  present,  we  see  three  general  approaches  to  this 
Issue.  We  will  explore  the  feasibility  of  obtaining  a  translation  through 
each  approach  so  that  we  can  use  the  level  of  agreement  as  an  Indicator  of 
the  accuracy  of  the  final  translation.  The  three  approaches  are: 

(1)  assessment  of  the  value  of  performance  through  surveys 
of  decision-makers,  . . 

(2)  inference  of  the  value  from  cost  data  and  some  assump¬ 
tions  about  the  relation  between  cost  and  value,  and 

(3)  comparisons  with  the  civilian  sector. 

Slnden  and  Worrell  (1979)  describe  several  methods  of  eliciting  valuation 
judgments.  "Direct  questioning"  approaches  could  be  used  to  ask  military 
officials  how  much  more  they  would  be  willing  to  pay  for  specified  Incre¬ 
ments  of  performance.  Other  approaches  Involve  "budget  allocation"  or  dif¬ 
ferent  "trade-off  games."  These  latter  approaches,  which  are  favored  by 
the  authors,  could  Involve  comparisons  of  performance  Increments  with  hard¬ 
ware  or  other  Items  with  known  dollar  costs.  We  will  develop  and  try  out 
alternative  variants  of  these  approaches  and  administer  the  best  of  them  at 
the  same  time  that  the  across  MOS  judgments  are  collected. 

In  exploring  the  second  approach,  the  proposed  methodology  is  based  on  the 
economic  concept  of  the  duality  of  cost  and  output  functions.  (A  profit 
maximization  problem  subject  to  a  cost  constraint  has  a  "dual"  which  Is  a 
cost  minimization  problem  subject  to  a  production  constraint.)  Relative 


values  of  Inputs  may  be  Inferred  either  from  production  data  or  from  cost 
data*  In  the  present  case,  the  valuation  of  performance  might  be  Inferred 
from  estimates  of  the  cost  of  producing  a  given  level  of  performance. 

If  this  project  were  being  conducted  as  a  profit  maximizing  business  firm, 
an  obvious  candidate  for  use  In  valuing  performance  In  various  occupations 
would  be  wages  (adjusted  to  Include  fringe  benefits  and  associated  costs  of 
employment  borne  by  the  employer).  Such  costs  would  represent  the  cost  to 
the  employer  of  maintaining  a  worker  In  a  given  job.  Most  economists  would 
deem  this  approach  superior  to  surveying  managers  and  asking  for  relative 
valuation  of  output  from  employees  at  differing  performance  levels,  even 
though  there  would  be  some  variation  In  performance  within  a  wage  class. 

In  the  present  case,  however,  the  Arn\y  does  not  pay  in  proportion  to  pro¬ 
ductivity  or  opportunity  cost  of  labor  except  In  relatively  Indirect  ways 
such  as  reenlistment  bonuses.  Instead,  wages  are.  In  effect  “administered 
prices"  and,  as  such,  are  Inappropriate  for  determining  the  value  of  per¬ 
formance  differences  directly.  An  alternative  concept  that  takes  account 
of  all  costs  associated  with  Army  personnel  at  given  performance  levels  Is 
replacement  cost.  This  cost  Includes  the  expected  recruitment,  testing, 
processing,  training,  and  compensation  costs  necessary  to  replace  a  soldier 
functioning  at  a  given  level  of  performance  In  a  given  MOS. 

It  Is  Important  to  note  that,  as  with  other  settings  In  which  administered 
prices  exist,  shortages  and  excess  supplies  may  result.  (For  a  recent 
review  of  some  of  the  effects  of  administered  prices,  see  Jacob  Mincer, 
1982).  These  results  Impose  other  costs  on  the  Army  which  must  be  consid¬ 
ered  as  part  of  the  total  cost  of  maintaining  a  soldier  In  a  particular  MOS 


at  a  particular  performance  level.  For  example.  In  occupations  for  which 
Army  pay  Is  less  than  alternatives  In  the  civilian  sector,  recruiting  cost 
to  fill  such  jobs  may  be  high  and  loss  rates  due  to  failures  to  reenllst 
may  also  be  high.  On  the  other  hand.  If  Army  pay  Is  better  than  pay  In 
civilian  jobs,  there  may  be  excess  supply  to  the  Army  resulting  In  low 
recruiting  costs  and  high  reenll.stment.  rates.  .  Because  replacement  cost 
Includes  attrition  and  non-reenllstment  losses.  It  Is  comprehensive  enough 
to  take  account  of  loss  rates  attributable  to  non-comparability  of  civilian 
and  military  pay  scales.  (Because  training  Is  costed  explicitly  there  is 
not  a  problem  that  might  exist  with  civilian  wages.) 

For  any  MOS/job  performance  category,  the  cost  of  replacing  an  Individual 
soldier  may  be  computed.  Cost  data  on  training,  testing,  and  processing 
should  be  directly  available  from  the  Army  MIS,  as  are  separation  rates; 
however,  recruiting  costs  by  recruit  "quality"  levels  will  have  to  be  In¬ 
ferred.  Some  econometric  studies  of  the  determinants  of  recruit  supply  can 
be  used  to  Infer  the  marginal  cost  of  recruiting  by  education  and  test 
score  category  (see  Dale  A  Gilroy,  1983  and  Huck  &  Midi  am,  1977,  for 
examples) . 

The  third  approach  will  be  to  examine  data  on  workers  employed  In  the 
civilian  sector.  Wage  differentials  could  be  related  to  differences  In 
either  ratings  of  performance  levels  or  performance  levels  predicted  from 
employee  ability  and  aptitude.  If  both  ability  and  performance  measures 
correlated  poorly  with  wages  for  specific  occupations,  this  would  suggest 
that  cost  utility  might  not  be  sensitive  to  performance  level  for  that 


4-68 


occupation.  This  approach  will  require  finding  comparable  civilian  occupa¬ 
tions  for  at  least  some  of  the  MOS  studied  and  using  longitudinal  data  sets 
that  contain  both  wage  and  ability  estimates  for  these  occupations.  One 
example  of  such  a  database  is  the  Project  TALENT  study  (Wise,  McLaughlin,  & 
Steel,  1978)  which  contains  Information  on  earnings  at  age  29  of  over  one 
hundred  thousand  Individuals  In  civilian  occupations  and  also  scores  from 
the  prior  administration  of  a  two-day  test  and  questionnaire  battery. 

In  addition,  research  which  relates  cognitive  and  non-cognltl ve  abilities 
to  wages,  supervisor  ratings  and  other  evaluative  measures  of  performance 
In  the  civilian  sector  will  be  reviewed  for  insight  Into  relationships 
betweenperformance  and  wages.  (Examples  Include  Glntis,  1971;  Grlllch  & 
Mason,  1972;  and  Wise,  1975.) 

Schedule  and  troop  support  requirements.  This  subtask  will  be  accomplished 
jointly  by  Task  4  and  Task  1  personnel,  with  Task  4  personnel  having  pri¬ 
mary  responsibility  for  the  collection  of  the  data  and  Task  1  personnel 
primary  responsibility  for  the  data  analysis.  As  shown  In  the  Integrated 
Master  Plan,  the  subtask  has  been  divided  Into  three  periods  of  activity. 
During  the  development  stage  (December,  1984  through  July,  1985),  we  will 
first  require  the  assistance  of  three  subject  matter  experts  in  each  of  the 
nine  MOS  for  which  MOS-specIflc  measures  are  developed.  Three  Interactions 
of  roughly  one-half  day  each  are  anticipated.  In  addition,  we  will  need  to 
select  and  begin  to  work  with  a  sample  of  30  military  officials  who  will 
supply  the  Initial  ratings  of  utility  estimation.  Each  will  participate  In 
one' of  the  half-day  workshops  to  develop  the  criteria  to  aid  the  across  MOS 
judgments,  and  on  a  selective  basis,  on  the  tryouts  of  the  judgmental 


4-69 


methods.  When  the  preliminary  research  on  the  nine  MOS  has  been  accom¬ 
plished,  additional  SME  will  be  used  to  scale  the  performance  constructs 
within  the  MOS  for  which  only  Arny-wide  measures  are  available.  The  mili¬ 
tary  officials  will  then  scale  the  MOS/performance  level  combinations  for 
all  19  MOS. 


During  the  second  stage  (August,  1985  through  February,  1986),  we  will 
examine  the  score  distributions,  reliability,  and  intercorrelatlon  of 
the  performance  measures  obtained  for  each  MOS  and  determine  whether  the 
wlthln-MOS  performance  construct  scaling  should  be  redone  (as  may  be  the 
case  If  we  decide  to  drop  some  measures  or  combine  others  In  different  ways 
based  on  the  empirical  results).  If  so,  an  additional  one-half  day  will  be 
required  from  three  SME  In  each  MOS  to  provide  data  for  within  MOS  scal¬ 
ings.  In  either  event,  one  additional  day  will  be  required  from  each  of  30 
senior  experienced  officers  (a  new  sample)  to  provide  the  cross-MOS  ratings 
at  _ach  level  (Including  cost/value  estimation).  These  sessions  can  be 
scheduled  Independently  at  the  convenience  of  the  officers  Involved.  The 
utility  values  obtained  will  be  used  to  help  evaluate  the  cost  effective¬ 
ness  of  various  measurement  alternatives  In  Task  2  decisions  concerning  the 
composition  of  the  Experimental  Battery. 

During  the  final  stage  (August,  1988  through  March  1989),  the  data  collec¬ 
tion  and  analysis  process  will  be  repeated  to  allow  for  the  Incorporation 
of  second-tour  measures  (administered  to  the  FY83/84  cohort)  and  to  accom¬ 
modate  any  changes  In  the  performance  battery  used  In  the  longitudinal 
validation  (FY36/87  cohort).  In  addition,  the  analysis  will  examine  tne 


consistency  between  the  utilities  obtained  earlier  (In  1986)  and  the  more 
recent  set  In  an  attempt  to  Identify  the  Impact  on  the  utilities  of  such 
factors  as  Inflation,  changes  In  the  civilian  labor  market  and  the  U.S. 
general  military  stance,  and  Innovations  in  military  doctrine,  equipment, 
and  manning  policies.  The  troop  support  requirements  will  be  the  same  as 
for-the  second-stage. - —  -  - - - - -  — — -  -  -  -  — 


Subtask  10:  Prepare  Final  R 


We  will  first  prepare  a  draft  final  report  describing  allTask  4  research 
to  this  point.  The  report  will  be  submitted  to  the  COR  and  revisions  made 
based  on  his  feedback.  We  will  then  submit  the  final  report  incorporating 
all  comments  and  suggestions  from  the  COR. 


4-71 


SUMMARY  OF  EXPECTED  OUTCOMES  RON  TASK  4 


Operational  Outcomes 

t 

The  following  outcomes  of  the  research  should  be  useful  to  the  Amy  from 
an  operational  standpoint: 

l,.  The  model  of,  soldier  effectiveness  should  provide  a  concrete,  behav¬ 
ioral  definition  of  what  Is  expected  of  a  first-tour  soldier  In  the 
Army.  This  definition  can  in  turn  be  used  to  explain  to  a  first-tour 
enlisted  person  what  Is  expected  of  him/her.  The  model  could  be  made 
part  of  Indoctrination  courses,  for  example,  to  demonstrate  to  Incoming 
troops  what  they  should  be  striving  for  as  soldiers  In  the  U.S.  Army. 
We  see  great  advantages  to  using  the  model  to  help  Indoctrinate  sol¬ 
diers  to  the  Army— for  one  thing,  the  behavioral  nature  of  this  model 
will  make  unambiguous  the  communicated  expectations  for  effective 
soldier  performance.  A  second  point  is  that  presenting  a  single  model 
to  all  Incoming  enlisted  personnel  provides  a  common  set  of  expecta¬ 
tions  so  that  everyone  gets  the  same  Information  about  these  expecta¬ 
tions,  and  misleading  guidance  on  soldier  performance  requirements  Is 
avoided. 


e 


r. 

£ 


s 

c- 


2.  Another  possible  application  of  the  soldier  effectiveness  model  Is  to 
provide  standardized  guidance  to  recruiters  about  what  Is  expected 
of  first-tour  soldiers  so  that  they  can  pass  this  Information  on  to 
prospects  and  recruits.  The  model's  dimensions  and  behavioral  perfor¬ 
mance  requirements  can  be  packaged  to  present  a  realistic  but  also 
highly  motivating  depiction  of  what  It  takes  to  be  a  successful 


£ 


4-72 


1325  523  OTT!  W™ 


soldier,  and  recruiters  can  be  Instructed  on  how  to  present  these 
materials  to  prospects/recruits.  The  package  on  the  model  might  be 
used  then  to  help  sell  prospects  on  an  Arn y  enlistment  by  essentially 
providing  an  In-depth  definition  of  what  Is  meant  by  "soldiering"  In 
the  Army,  and  also  as  a  kind  of  realistic  job  preview  (Wanous,  1973)  to 
show  recruits  what  performance  requirements  to  expect  during  their 
first  term, 

A  third  use  for  the  model  of  soldier  effectiveness  Is  to  provide  new 
dimensions  for  the  EER.  The  model's  dimensions  should  be  Ideal  as  EER 
dimensions  because  they  will  be  based  on  an  across-MOS,  Arny-wlde 
analysis  of  enlisted  jobs,  and  the  performance  requirements  emerging  In 
the  model  will  reflect  concrete,  observable,  dimensions  of  soldier 
effectiveness. 

The  rating  scale  administration  package  and  procedures  can  b>  used  In 
future  personnel  research  In  the  Army.  A  major  effort  In  the  present 
research  will  be  to  develop  an  effective  but  very  efficient  sat  of 
procedures  for  administering  performance  rating  scales  to  large  numbers 
of  persons.  These  procedures  and  the  package  of  materials  found  most 
effective  and  efficient  can  certainly  be  adapted  for  use  In  other  Army 
personnel  research  where  ratings  of  many  persons  are  required. 

Likewise,  we  will  develop  In  the  research  a  system  and  procedure  fcr 
scoring  Individual  enlisted  personnel  on  administrative  index 
composites  and  on  attrltlon/reenl 1 stment  categories.  Future  personnel 
research  requiring  soldier  effectiveness  scores  can  certainly  use  this 
system/procedure . 


6.  Also  likely  to  arise  from  Task  4  research  are  guidelines  to  record  and 
collect  administrative  data  more  consistently  across  units.  These 
guidelines  should  Increase  considerably  the  quality  and  usefulness  of 
the  administrative  data  for  Indexing  aspects  of  soldier  effectiveness. 
Thus,  In  future  personnel  research  efforts  requiring  such  effectiveness 
scores  on  enlisted  personnel,  correspondence  of  these  scores  across 

.  _  units  and  the  accuracy  of  the  data  should  be  enhanced  substantially 
over  the  scores  on  admlnlstratl ve  Indices  presently  available. 

Scientific  Outcomes: 

The  following  scientific  outcomes  are  anticipated  from  the  Task  4  research: 

1.  A  major  theme  of  the  research  In  Task  4  Is  the  evolving  of  principles 
and  conclusions  regarding  effective  and  efficient  methods  of  gathering 
rating  data  from  large  numbers  of  persons.  What  we  learn  In  this 
effort,  e.g.,  the  kinds  of  Instructions  to  raters,  rules  for  selecting 
raters,  training  and  orientation  for  raters,  etc.,  that  lead  to  rela¬ 
tively  high  quality  ratings,  will  be  Important  knowledge  that  can  be 
applied  in  all  situations  where  large-scale  rating  data  collection  is 
required.  Considerable  research  has  been  done  on  the  effects  of 
different  rating  formats  on  quality  of  ratings  (e.g.,  Ounnette  4 
Borman,  1979;  landy  &  Farr,  1980),  but  work  Is  needed  to  specify 
aspects  of  the  context  In  which  raters  are  placed  that  likewise  Influ¬ 
ence  quality  of  ratings.  The  series  of  field  tests  planned  for  Task  4 
research  should  yield  considerable  knowledge  In  this  area. 


2.  A  trend  in  the  recent  literature  has  been  to  examine  accuracy  of 
ratings  (where  possible),  rather  than  or  in  addition  to  assessing 
psychometric  characteristics  of  ratings  such  as  halo,  leniency,  and 
restriction  of  range  (Bernardln  4  Pence,  1980;  Borman,  1979).  The 
argument  Is  that  these  two  sets  of  rating  quality  criteria  often  do 
“  “not  correspond  very  closely,  end  on  “a  conceptual  basis  accuracy  is 
definitely  the  criterion  of  most  importance.  As  an  example,  we  have 
found  that  a  certain  rater  training  program  reduced  halo  but  left 
accuracy  of  the  ratings  unaffected  (Borman,  1979). 

Although  we  look  favorably  on  the  trend  toward  considering  accuracy 
(e.g.,  Dunnette  4  Borman,  1979),  a  criticism  of  research  done  on 
accuracy  of  ratings  is  that  It  has  been  performed  in  the  laboratory 
using  "paper-people"  ratees  (stories  about  how  someone  performs  on  a 
job)  or  videotaped  performers  to  be  rated  (Campbell,  1978).  These 
settings  for  accuracy  research  may  not  be  the  most  realistic,  and 
research  findings  In  the  laboratory  might  not  generalize  well  to 
organizational  settings. 

A  problem  here  Is  that  typically,  organizational  settings  provide  no 
ooportunlty  to  obtain  actual  or  "true"  performance  scores  on  Individ¬ 
uals  against  which  to  compare  ratings  of  their  performance.  For  the 
vast  majority  of  jobs  no  absolute  standards  exist  to  enable  assignment 
of  true  performance  scores  to  employees,  and  thus,  accuracy  of  ratings 
cannot  be  evaluated. 


All  of  this  leads  to  the  observation  that  In  this  research  we  will  have 
available  at  least  an  approximation  of  these  true  scores.  Task  5 
researchers  will  develop  performance  tests  presumably  yielding  high 
fidelity  performance  scores,  providing  an  accurate  picture  of  each 
soldier's  actual  performance  level  (at  least  In  the  technical  compe¬ 
tence  aspects  of  the  job).  We  anticipate  using  these  performance 
scores  as  standards  against  which  to  compute  the  accuracy  of  Task  4 
ratings  made  In  the  research. 2 

We  ,ery  enthusiastic  about  this  opportunity  to  bring  research  on 
performance  rating  accuracy  out  Into  an  actual  organizational  setting. 
Examining  the  effects  of  different  rating  formats,  various  administra¬ 
tive  sets  in  which  raters  are  placed,  different  rater  training  and 

orientation  procedures,  etc.  on  rating  accuracy  In  a  "real"  organiza¬ 
tional  context  should  yield  very  Important  results  and  conclusions 
bearing  on  how  to  generate  more  accurate  ratings. 


3.  Some  research  has  examined  supervisory,  peer,  and  self-rating  sources 
comparing  the  relationships  between  ratings  sources  (e.g.,  Heneman, 
1974;  Kllmoskl  &  London,  1974)  and  the  contributions  to  validity  of 


2Performance  measurement  In  Task  5  relates  to  the  "can-do"  part  of  the 
criterion  space,  that  Is,  the  skill  or  abll Ity-related  proficiency  aspects 
of  performance.  The  "will-do"  criterion  space,  that  Is,  the  soldier's 
performance  over  time  on  the  job,  his/her  continuing  motivation  to  succeed, 
etc.,  cannot  be  well  tapped  by  this  kind  of  performance  measurement.  Thus, 
we  must  be  careful  correlating  Task  4  ratings  with  Task  5  performance 
scores  and  how  we  Interpret  these  relationships.  However,  Task  4  ratings 
of  a  soldier's  technical  competence,  for  example,  can  be  justifiably 
compared  to  Task  5  performance  scores  to  evaluate  the  accuracy  of  Task  4 
ratings. 


ratings  from  these  sources  (Borman,  1974;  Buckner,  1959;  Campbell, 
Dunnette,  Lawler,  &  Welck,  1971).  Typically,  however,  studies  take 
place  In  a  single  organizational  setting,  with  relationships  between 
raters,  l.e.,  supervisors  and  peers,  and  ratees  fixed  for  the  most 
part.  In  other  words,  held  constant  Is  the  familiarity  of  the  ratees' 
work  on  the  part  of  each  organizational  level's  raters.  For  example, 
peers  may  be  very  familiar,  and  supervisors  not  at  all  familiar  with  a 
ratee's  work  In  an  organization  studied.  In  addition,  raters  who  come 
from  different  organizational  levels  are-  likely  to-vlew  a-ratee's 
performance  within  a  narrowly  defined  set.  For  example,  supervisors 
may  be  able  to  view  only  a  ratee's  Interpersonal  skills,  while  peers 
may  view  a  ratee's  behavior  In  all  aspects  of  the  job. 

The  point  Is  that  the  Army  offers  great  variety  In  this  regard.  Some 
unUs,  e.g..  Infantry,  have  both  supervisors  and  peers  who  will  be  very 
knowledgeable  about  ratee  performance  In  all  aspects  of  the  job.  Other 
units,  e.g.,  maintenance,  may  have  raters  from  different  organizational 
levels  viewing  very  different  aspects  of  a  ratee's  Job  performance. 
Therefore,  research  can  be  conducted  to  evaluate  how  different  rater- 
ratee  work  relationships  and  different  opportunities  on  the  part  of  the 
raters  to  view  ratee  performance  influences  their  performance/effec¬ 
tiveness  evaluations  of  ratees.  An  obvious  question  here  Is  how  these 
relationships  and  opportunities  relate  to  Interrater  agreement  In 
ratings  and  to  accuracy  of  the  ratings. 

4.  Research  on  composites  of  low  base  rate  objective  measures  may  lead  to 
some  general  guidelines  for  how  base  rate  problems  can  be  dealt  with  In 


4-77 


personnel  research.  Low  base  rates  In  psychological  research  have  been 
an  acknowledged  difficulty  for  many  years  (Meehl  &  Rosen,  1955),  and 
forming  composites  of  them  Is  one  possible  approach  to  alleviating 
these  problems.  Task  4  research  can  assess  the  usefulness  of  this 
approach. 

5,  James  (1973),  Smith  (1976),  and  others  have  written  about  construct 
validity  principles  applied  to  criterion  development,  but  little  has 
been  done  to  follow  up  with  such  applications.  .Likewise,  In  our 
proposal  we  discuss  working  on  Army-wide  criterion  development  steps 
within  a  construct  validation  framework.  However,  In  the  Task  4 

research  we  plan  to  £ut_  Into  practice  these  construct  validity 
principles. 

First,  the  model  of  soldier  effectiveness  Is  meant  to  be  an  Inductively 
derived  behavioral  definition  of  the  dimensions  of  Army-wide  perfor¬ 
mance  and  effectiveness.  This  behavioral  definition  should  exhaust  the 
domain  of  Important  performance  requirements  and  effectiveness  dimen¬ 
sions  that  pertain  to  all  MOS,  and  will  drive  development  of  measures 
to  tap  each  element  of  the  model.  In  the  measure  development  work, 
careful  attention  will  be  directed  to  selecting  the  most  appropriate 
method (s)  to  Index  performance/effectiveness  for  each  of  the  model '$ 
dimensions.  Supervisor  and  peer  ratings,  along  with  objective  Indices 
of  effective  and  Ineffective  soldier  behavior,  will  be  targeted  toward 
the  appropriate  model  dimensions. 


4-78 


Regarding  analysis  of  criterion  data,  two  conceptions  will  guide  our 
efforts.  First,  we  will  use  Ideas  of  multltralt-multlmethod  analysis 
and  convergent  and  discriminant  validity  of  measures  (Campbell  &  Flske, 
1959r  Kavanagh,  MacKInney,  4  Mol  Ins,  1971)  to  evaluate  the  quality  of 
objective  Indices  and  ratings  from  different  sources.  Wherever  possi¬ 
ble  we  will  strive  for  convergence  across  methods  In  measuring  a 
performance/effectiveness  construct  and  also  for  differentiation  In 
measurement  of  very  different  kinds  of  constructs. 


The  reason  differentiation  Is  Important  Is  that  prediction-criterion 

links  tend  to  make  better  conceptual  sense  when  specific  rather  than 

global  criteria  are  available.  This  Is  the  same  reasoning  used  In 
developing  categories  of  attrition.  Very  different  predictors  are 

likely  to  be  appropriate  In  predicting  medical  attrition  and  attrition 
for  disciplinary  reasons. 

To  aid  In  differentiating  between  constructs  In  measuring  criterion 
performance  and  effectiveness,  we  will  apply  a  strategy  that  combines 
factor  analysis  of  criterion  data  with  hypotheses  about  underyllng 
performance/effectiveness  constructs.  These  hypothesized  constructs 
should  reflect  Important,  homogeneous  variables  within  constructs,  and 
yet,  they  should  show  conceptually  different  content  across  the 

constructs. 

As  mentioned  previously,  we  have  some  hope  that  such  an  approach  will 
be  fruitful.  8orman,  Rosse,  and  Abrahams  (1980)  discovered  that  a 
conceptually  reasonable  3-factor  structure  described  Navy  recruiter 


4-79 


performance,  and  this  3-dlmenslon  solution  was  replicated  In  two  other 
samples.  Also,  In  ratings  of  transmission  and  distribution,  worker 
performance,  a  2-factor  solution,  made  excellent  conceptual  sense 
(technical  competence  and  Interpersonal  adjustment  to  job  demands; 
Borman,  Mendel,  tammleln,  &  Rosse,  1981).  Thus,  In  Task  4  research,  we 
will  be  alert  to  possible  underlying  constructs  that  might  map  the 
performance/effectiveness  domain- In  a  conceptually  appropriate  manner, 
and  which  can  be  differentially  measured  with  our  criterion  Instru¬ 
ments. 

In  sum,  we  believe  that  reliance  on  these  and  other  applications  of 
construct  validation  thinking  will  lead  to:  (a)  a  better  conceptual 
picture  of  the  Army-wide  criterion  domains;  (b)  more  accurate  measures 
of  the  relevant  performance/effectiveness  constructs;  and  (c)  more 
meaningful  and  valid  predictor-criterion  relationships. 

An  Intention  with  the  model  of  soldier  effectiveness  Is  to  define  a 
broad  set  of  domains  relevant  to  a  soldier's  worth  or  value  to  his/her 
unit  and  the  Army.  This  Is  a  broader  view  of  performance/effectiveness 
than  Is  typically  focused  upon,  and  we  may  learn  something  new  from 
taking  this  approach.  For  example,  the  expanded  conception  of  soldier 
effectiveness  may  better  address  some  of  the  Intersections  between 
Individual  and  organizational  effectiveness.  Consider  the  organiza¬ 
tional  commitment  domain  Introduced  In  the  preliminary  model  of  soldier 
effectiveness.  This  has  nothing  to  do  with  Individual  performance  as 
we  usually  think  about  It,  but  commitment  on  the  part  of  some  critical 
percentage  of  a  unit's  members  might  have  considerable  impact  on  the 
unit's  overall  effectiveness. 


We  do  not  mean  to  say  that  this  contract  will  solve  many  problems  In 
the  area  of  organizational  effectiveness.  However,  the  relatively 
broad  view  of  soldier  effectiveness  may  help  to  shed  some  light  on  how 
Individual  performance/effectiveness  relates  to  organizational  effec¬ 
tiveness.  Results  of  this  view  will  also  be  Interesting  In  their  own 
right.  We  have  Indicated  some  possible  domains  that  might  emerge  from 
model  development  work,  but  others  may  be  Identified,  as  well,  in  this 
work.  Thus,  the  content  Itself  of  the  total  set  of  dimensions  devel¬ 
oped  In  the  model  will  be  of  Interest. 

Although  the  work  Is  admittedly  exploratory,  we  should  learn  something 
about  performance  ratings  from  development  and  testing  of  the  Combat 
Performance  Prediction  Scales.  What  Is  planned  here  Is  very  unusual 
In  relation  to  performance  rating  formats  and,  especially,  to  what  Is 
required  of  raters.  Format  development  steps  will  lead  to  dimensions 
of  performance  reflecting  a  completely  different  context,  l.e.,  combat, 
from  the  one  currently  being  experienced,  l.e.,  garrison  setting. 
Thus,  It  will  be  of  Interest  to  see  how  scale  development  works  within 
these  contralnts. 

More  scientifically  compelling,  however,  Is  learning  about  the  effect 
on  ratings  of  the  unusual  task  asked  of  raters  using  the  scales,  l.e., 
evaluate  how  you  believe  each  soldier  would  perform  In  a  very  different 
setting.  The  rater  must  observe  and  recall  the  soldier's  behavior  in 
the  garrlson/fleld  setting  and  make  Inferences  about  how  he  would  per¬ 
form  in  combat.  This  Is  somewhat  akin  to  making  "ratings  of  potential" 
for  higher  level  jobs,  as  Is  done  In  many  organizations.  However,  It 


Is  likely  that  the  jobs  for  which  predictions  are  being  made  are 
typically  more  similar  In  context  to  the  present  job  than  is  the  case 
when  combat  performance  Is  being  predicted  based  on  garrison 
performance . 

It  may  be  that  raters  will  have  considerable  difficulty  making  these 
Inferences..  The  research  planned  will  explore  how  these  inferences  are 
attempted,  whether  or  not  raters  can  differentiate  between  present  and 
predicted  performance,  and  what  levels  of  interrater  agreement  emerge 
In  such  a  rating  task.  These  and  possibly  other  analyses  should  tell 
us  something  about  the  evaluative  judgment  process  under  conditions 
that  require  considerable  Inference. 

8.  Certain  results  from  Task  4  research  should  bear  on  the  “trait- 
situation  controversy"  In  personality  psychology.  Briefly,  the  trait 
side  argues  that  relatively  stable  personal  characteristics  In  Individ¬ 
uals  for  the  most  part  determine  behavior  In  a  variety  of  different 
situations  (e.g..  Block,  1971).  "Sltuatlonlsts"  argue  that  character¬ 
istics  of  the  context  or  the  situation  in  which  people  find  themselves 
largely  dictate  behavior,  no  matter  what  the  personal  characteristics 
are  of  the  Individuals  Involved  (e.g.,  Shweder,  1975).  There  Is  also 
an  Interactlonist  position  that  considers  behavior  to  be  a  function  of 
an  Interaction  between  the  person  and  situations  (e.g.,  Bowers,  1973). 
The  Issues  in  this  controversy  are  complex  and  technical,  but  this  is 
sufficient  for  our  purposes. 

Within  the  present  research,  consider  second-tour  performance.  It 
would  seem  to  be  a  function  of:  (a)  personal  character1stics 

4-82 


*ni  nr.7’ 


-  those  that  persons  bring  with  them  to  the  service;  and  (b)  first-tour 
experiences  -  characteristics  of  the  unit,  quality  of  training  and 
leadership  experienced,  etc.  These  can  be  viewed  as  trait  versus 
situation  factors  and  we  win  be  able  to  get  a  general  Idea  of  the 
contribution  of  each  to  second-tour  performance.  This  is  because  test 
and  Inventory  scores  will  be  available  for  many  soldiers  (an  index  of 
traits).  Thus,  by  comparing  correlations  between  inventory  scores  and 
second-tour  performance  to  correlations  between  first-tour  experience 
responses  and  that  performance,  we  can  gain  some  Idea  of  the  relative 
contributions  of  traits  versus  the  situation  to  second-tour  soldier 
effectiveness. 

Finally,  the  Task  4  research  program  will  produce  recommendations  on 
practical  procedures  for  determining  performance  utilities  In  complex 
employment  situations  Involving  a  number  of  different  jobs  and 
settings.  We  will  develop  a  computer-administered  standard  tool  that 
can  be  used  repeatedly  to  derive,  extend,  modify  and  maintain 
utilities.  The  availability  of  a  relatively  easy-to-follow  procedure 
for  measuring  utility  will  promote  the  use  of  formal  decision  rules  In 
selection  and  classification  research  and  In  employment,  thus 
Increasing  the  efficiency  of  personnel  utilization  In  our  society. 


In  sum,  we  believe  that  the  Task  4  research  program  should  produce  both 
significant  scientific  and  operational  outcomes.  That  Is,  the  expected 
outcomes  of  Task  4  will  contribute  to  our  general  knowledge  and  understand¬ 
ing  of  performance/effectiveness  measurement  In  a  large  organization  and 
should  also  contribute  to  the  operational  needs  of  the  U.S.  Army. 


REFERENCES 


Bernardln,  H  J.,  &  Pence,  E.C.  Effects  of  rater  training:  Creating  new 
response  .•'its  and  decreasing  accuracy.  Journal  of  Applied  Psychology, 
1980,  65,  .‘0-66. 

Block,  J.  Lives  through  time.  Berkeley,  CA:  Bancroft  Books,  1971. 

Borman,  W.C.  The  rating  of  Individuals  in  organizations:  An  alternate 
approach.  Organizational  Behavior  and  Human  Performance,  1974,  12, 

105-124.  "" 

Borman,  W.C.  Effects  of  Instructions  to  avoid  halo  error  on  reliability 
and  validity  of  performance  evaluation  ratings.  Journal  of  Applied 
Psychology,  1975,  60,  556-560. 

Borman,  W.C.  Format  and  training  effects  on  rating  accuracy  and  rater 

errors.  Journal  of  Applied  Psychology,  1979,  64,  410-421. 

Borman,  W.C.,  &  Ounnette,  M.D.  Behavior-based  versus  trait-oriented 
performance  ratlnqs:  An  empirical  study.  Journal  of  Applied 

Psychology,  1975,  60,  561-565. 

Borman,  W.C.,  &  Peterson,  N.G.  Selection  and  training  of  personnel.  In 
G.  Salvendy  (Ed.),  Handbook  of  Industrial  Engineering.  New  York:  John 
Wiley,  In  press. 

Borman,  W.C.,  Hough,  L.M.,  &  Ounnette,  M.D.  Development  of  behavioral 1y 
based  rating  scales  for  evaluating  the  performance  of  U.S.  Navy 
recruiters.  Navy’ Personnel  “Research  and  Oevelopment  Center  Technical 
Report  fR-76-31,  1976. 

Borman,  W.C.,  Johnson,  P.D.,  Motowldlo,  S.J.,  &  Ounnette,  M.D.  Measuring 
motivation,  morale  and  job  satisfaction  In  Army  careers.  Ml nneapol Is: 
Personnel  Decisions,  Inc.,  1976. 

Borman,  W.C.,  Mendel,  R.M.,  Lammleln,  S.E.,  &  Rosse,  R.L.  Developing  and 
evaluating  the  validity  of  a  test  battery  to  predict  performance  In 
transmission  and  distribution  jobs  at  Florida  Power'  and  LlghtT 
Minneapolis:  Personnel  Decisions  Research  institute,  19B1. 

Bowers,  K.S.  Sltuatlonlsm  In  psychology:  An  analysis  and  a  critique. 
Psychological  Review,  1973,  80,  307-336. 

Buckner,  D.N.  The  predictability  of  ratings  as  a  function  of  Interrater 
agreement.  Journal  of  Applied  Psychology,  1959,  43,  60-64. 

Campbell,  D.T.,  &  Flske,  D.W.  Convergent  and  discriminant  validation  by 
the  multltralt-multlmethod  matrix.  Psychological  Bulletin,  1959,  56, 
81-105. 


Campbell,  J.P.  What  we  are  about:  An  Inquiry  Into  the  self  concept  of 
Industrial  and  organizational  psychology.  Presidential  address  to 
Division  14,  American  Psychological  Association,  Toronto,  August,  1978. 

Campbell,  J.P.,  Ounnette,  M.D.,  Arvey,  R.,  &  Hellervlk,  l.  The  development 
4nd  evaluation  of  behavlorally  based  rati nq  scales.  Journal  of  Applied 
Psychology.  1973,  57,  15-22. 

Campbell,  J.P.,  Dunnette,  M.O.,  Lawler,  E.E.,  III,  £  Welck,  K.E.,  Jr. 

Managerial  behavior,  performance,  and  effectiveness.  New  York: 

-  Mcfiraw-Hin,  1970.  "  -  -  ' - 

Cronbach,  I.J.,  Gleser,  G.C.,  Nanda,  H. ,  &  Rajaratnam,  N.  The  dependa¬ 
bility  of  behavioral  measurements:  Theory  of  general Izablllty  for' 

scores  and  profiles^  New  York:  Wiley,  1972. 

Dale,  C.,  £  Gilroy,  C.  The  effects  of  the  business  cycle  on  the  size  and 
composition  of  the  U.S.  Army.  Atlantic  Economic  Journal,  1983,  JU. 

Ounnette,  M.D.  Personnel  selection  and  placement.  Belmont,  CA: 
Wadsworth,  1966. 

Dunnette,  M.D.,  &  Borman,  W.C.  Personnel  selection  and  classification 
systems.  In  L.W.  Porter  and  M.R.  Rosenzwelg  (Eds.),  Annual  Review  of 
Psychol oqy ,  1979,  30,  477-525. 

Edwards,  W.  How  to  use  multlattrlbute  utility  measurement  for  social 
decision  maklnq.  IEEE  Transactions  on  Systems.  Man,  and  Cybernetics, 
1977,  SMC  7,  326-345: 

Flanagan.  J.C.  The  critical  Incident  technique.  Psychological  Bulletin. 
1954,  51,  325-58.  - 

Glntls,  H.  Educational  technology  and  the  characteristics  of  worker 

productivity.  American  Economic  Review.  1971,  6^,  266-279. 

Grlllches,  Z.,  &  Mason,  W.  Education,  Income,  and  ability.  Journal  of 
Political  Economy,  1972,  80,  74-103. 

Gulon,  R.M.  Personnel  testing.  New  York:  McGraw-Hill,  1965. 

Hammer,  T.H.,  £  Landau,  J.  Methodological  issues  In  the  use  of  absence 
data.  Journal  of  Applied  Psychology,  1981,  66,,  574-581. 

Hamner,  W.C.,  Kim,  J.S.,  8a1rd,  L.,  £  Blgoness,  W.J.  Race  and  sex  as 
determinants  of  ratings  by  potential  employers  In  a  simulated  work 
sampling  task.  Journal  of  Applied  Psychology,  1974,  59,  705-711. 

Hebeln,  J.,  Kaplan,  A.,  Olmstead,  J.A.,  £  Sharon,  B.  NCO  Leadership; 
Task,  Skills,  and  Functions.  Alexandria,  VA:  Human  Resources  Research 
Organization,  fMnal  fteport,  February,  1983. 

Heneman,  H.G.,  III.  Comparisons  of  self-  and  superior  ratings  of  mana¬ 
gerial  performance.  Journal  of  Applied  Psychology,  1974,  <59,  638-642. 


4-86  _ _  j  ^ 


ryfVFry-pr  7’’*? ' 


u 

Huck,  0.,  4  Mldlam,  K.  ttaylopwtnt  of  methods  for  analysis  of  the  cost  of 
enlisted  attrition.  General  Reswrch  Corporation  Technical  Report, 
1977. 


Oametj  L.R.  Criterion  motels  and  construct  validity  for  criteria.  Psycho¬ 
logical  Bulletin .  80,  7S-#3. 

Kavanaugh,  M.J.,  MacKInney,  A.C.,  &  Wollns,  L.  Issues  of  managerial 
performance:  MultHrttt -multimethod  analyses  of  ratings.  Psychologi¬ 
cal  Bulletin.  19H.  78,  3^4», 


is 


i 


Keeny,  R.,  4  Ralffa,  H„  jfrgliloniwjth  multiple  objectives.  New  York,  NY: 
wlley-Intericlence,  iff*. 

Kllmoskl,  R.J.,  4  London,  M.  Role  of  the  rater  In  performance  appraisal. 
journal  of  Applied  Psychology,  1974,  59,  445-451. 

Landv.  F.J..  4  Farr.  J,  Performance  rating.  Psychological  Bulletin,  1980, 
87,  72-107.  - - 

landy,  F.J.,  4  Trumbo,  O.A.  Psychology  of  work  behavior.  Homewood,  IL: 
Dorsey,  1980. 

Lawler,  E.E.,  III.  The  multi -trait-multi -rater  approach  to  content  valid¬ 
ity.  Journal  of  Applied  Psychology,  1967,  51.,  369-381. 

Luce,  R.D.,  4  Suppes,  P.  Preference,  utility,  and  subjective  probability. 
In  Luce,  et.al.  (Eds-).  Handbook  of  Mathematical  Psychology,  III.  New 
York,  NY:  Wlley-Intersdence,  1966.  ”” 

Luce,  R.D.,  4  Tukey,  J.W.  Simultaneous  conjoint  measurement:  A  new  typn 
of  fundamental  measurement.  Journal  of  Mathematical  Psychology,  1964, 
1,  1-27. 


Heehl,  P.E.,  4  Rosen,  A.  Antecedent  probability  and  the  efficiency  of 
psychometric  signs,  patterns,  or  cutting  scores.  Psychological 
Bulletin,  1955,  52,  194-216. 

Mincer,  J.  The  Economics  of  Wage  Floors.  New  York,  NY:  Columbia  Univer¬ 
sity,  19571  ' 

Motowldlo,  S.J.,  4  Borman,  W.C.  Behavlorally  anchored  scales  for  measuring 
morale  In  military  units.  Journal  of  Applied  Psychology,  1977,  62, 
177-183.  -  - -  “ 

Peterson,  N.G.,  Houston,  J.S.,  4  Rosse,  R.L.  The  uOMA  job-effectiveness 
prediction  system:  Job  performance  criteria!  Submftted  to  L1l*e  6/lMce 
Management  Association,  July,  1981. 

Ralffa,  H.  Decision  analysis:  Introductory  lectures  on  choices  under 
uncertainty:  Reading,  mA:  Addl son-Wesley,  l$7b. 

Schmidt,  F.L.,  Hunter,  J.,  6  Pearlman,  K,  Assessing  the  economic  impact  of 
personnel  programs  on  work  force  productivity.  Personnel  Psychology, 
1982,  35,  333-347. 


i 

m 


,4^87 


J 


Schmidt,  F.L.,  Hunter,  J.E.,  &  Urry,  V.W.  Statistical  power  In  criterion- 
related  validity  studies.  Journal  of  Applied  Psychology,  1976,  jU, 
473-485. 

Shields,  J.L.,  Hanser,  L.M.,  Williams,  E.W.,  4  Popelka,  B.A.  PI  1  ot 
research  for  validation  of  ASVAB.  and  enlistment  standards  against 

Serlormance  on  the  job.  Paper  given  at  Military  Testing  Association, 

cToSerTIWr - 

Shweder,  R.A.  How  relevant  Is  an  Individual  difference  theory  of  person¬ 
ality?  Journal  of  Personality,  1975,  43,  455-484.  _ 

Slnden,  J.A.,  &  Worrell,  A.C.  Unpriced  values:  Decisions  without  market 
prices.  New  York,  NY:  Wlley-Intersclence,  1979. 

Smith,  P.C.  Behaviors,  results,  and  organization  effectiveness:  The 
problem  of  criteria.  In  M.D.  Dunnette  (Ed.),  Handbook  of  I ndust r 1 a 1 
and  Organizational  Psychology.  Chicago:  Rand  McNally,  1976. 

Smith,  P.C.,  4  Kendall,  L.M.  Retranslatlon  of  expectations:  An  approach 
to  the  construction  of  unambiguous  anchors  for  ratings  scales.  Journal 
of  Applied  Psychology,  1963,  47,  149-155. 

Steers,  R.M.  Antecedents  and  outcomes  of  organizational  commitment. 
Administrative  Science  Quarterly,  1977,  22,  46-56. 

Taylor,  E.K.,  4  Wherry,  R.J.  A  study  of  leniency  in  two  rating  systems. 
Personnel  Psychology.  1951,  4,  39-47. 

Terborg,  J.R.,  4  Ilgen,  D.R.  A  theoretical  approach  to  sex-dlscrlmlnatlon 
In  traditionally  masculine  occupations.  Organizational  Behavior  and 
Human  Performance,  1975,  1^,  352-376. 

VanMaanen,  J.,  4  Scheln,  E.H.  Towird  a  theory  of  organizational  socializa¬ 
tion,  In  B.M.  Staw  (Ed.),  Research  In  organizational  behaviors  (Volume 
JJ..  Greenwich,  CT:  JAI  Press,  1979. 

Wanous,  J.P.  Effects  of  a  realistic  Job  preview  on  job  acceptance,  job 
attitudes,  and  Job  survival.  Journal  of  Applied  Psychology,  1973,  58, 
327-332. 

Wernlmont,  P.F.,  4  Campbell,  J.P.  Signs,  samples,  and  criteria.  Journal 
of  Applied  Psychology,  1968,  S2,  372-76. 

wise,  D.  Academic  achievement  and  job  performance.  American  Economic 
Review,  1975,  65(3),  350-66. 

Wise,  L.L.,  McLaughlin,  D.H.,  4  Steel,  L.  The  Project  TALENT  Databank 
Handbook.  Palo  Alto,  CA:  American  Institutes  for  Research,  1 9 7 3 . 


4-88 


TASK  5  RESEARCH  PLAN 


MEASUREMENT  OF  MOS-SPECIFIC  PERFORMANCE 

GENERAL  PURPOSE  OF  TASK  5 


The  type  of  linked  personnel  decision-making  system  that  will  result  from 
this  project  has  long  been  of  Interest  to  the  Army.  However,  the  Army  cur¬ 
rently  has  neither  the  system  nor  the  data  to  make  critical  personnel  deci¬ 
sions  throughout  a  soldier's  life  cycle  based  on  the  soldier's  job  perform¬ 
ance  and  the  needs  of  the  Army.  In  the  system  currently  In  use,  the  Initi¬ 
al  selection  and  classification  decisions  are  predicated  on  the  relations¬ 
hips  of  entrance  tests  to  performance  In  the  Advanced  Individual  Training 
(AIT)  environment.  They  are  not  tied  to  performance  on  the  job.  In  fact, 
with  few  exceptions,  entrance  tests  have  not  been  validated  using  job 
performance  as  a  criterion. 

8efore  we  can  evaluate  the  relationships  between  scores  on  predictor 
Instruments  and  actual  job  performance  we  must  resolve  the  criterion  prob¬ 
lem,  the  key  problem  In  this  project,  and  the  joint  objective  of  Tasks  4 
and  5.  Task  4  Is  concerned  with  the  development  of  valid  measures  of  over¬ 
all  performance  as  a  soldier;  l.e.,  of  constructs  that  apply  to  all  MOS. 
The  purpose  of  Task  5  Is  to  develop  criterion  Instruments  that  accurately 
measure  MOS-specIflc  job  performance. 


BACKGROUND  ISSUES  AND  RATIONALE 


Developments  in  two  areas  set  the  stage  for  work  on  Task  5:  performance 
evaluation  in  the  Army  and  job  and  task  analysis. 

Performance  Evaluation  In  the  Amy 

The  Army's  mobilization  experience  In  World  Wars  I  and  II  led  to  a  classi¬ 
fication  process  that  was  based  on  matching  an  Individual's  civilian  job 
skills  with  those  of  a  comparable  military  job.  The  process  emphasized 
occupational  code  equivalencies  rather  than  Independent  standards  of  job 
competency.  It  was  not  until  1955  when  the  Army  Instituted  the  Enlisted 
Evaluation  System  (EES)  that  standards  were  established  and  proficiency 
began  to  be  assessed.  The  EES  used  a  job  knowledge  test— more  commonly 
referred  to  as  the  MOS  test  or  Pro  Pay  Test— together  with  a  Commander's 
Evaluation  Report  (CER)  in  annually  evaluating  enlisted  personnel  In  grades 
E-3  and  above.  Results  of  the  evaluations  were  used  chiefly  for  personnel 
management  purposes:  to  determine  eligibility  for  reenlistment,  promotion, 
proficiency  pay,  additional  schooling  and  the  like. 

The  MOS  test  was  a  norm-referenced  achievement  test  designed  to  measure,  In 
a  broad  sense,  job  knowledge.  No  claim  was  made  that  the  MOS  test  measured 
job  proficiency,  although  by  Implication  something  related  to  job  profi¬ 
ciency  was  being  tapped.  Standard  four-alternatl ve  multiple-choice  test 
questions  were  drafted  by  military  SHE  personnel  In  Item  Writing  Agencies 
(service  schools).  The  questions  were  based  on  test  outlines  prepared  by 
the  Enlisted  Evaluation  Center  (EEC)  from  functional  MOS  descriptions 


contained  In  AR  611-201.  Thus  the  content  of  a  test  tended  to  reflect  the 
MOS-producIng  training  program;  Its  content  validity  was  limited  to  the 
degree  to  which  the  field  requirements  of  an  MOS  were  reflected  in  the  MOS 
functional  description  and  training  program.  Psychometricians,  who  staffed 
EEC,  edited  and  revised  Items  submitted  by  the  schools,  assembled  the 
125-Item  tests,  sent  them  to  the  field  for  administration  by  a  Test  Control 
Officer,  scored  the  returned  tests  and  combined  each  score  with  the  com¬ 
mander's  twelve-factor  rating  (CER)  In  reporting  to  each  soldier  his  MOS 
Evaluation  Score. 

EEC  maintained  Item  banks  based  on  the  conventional  Internal  consistency 
Item  statistics.  With  Isolated  exceptions,  the  MOS  tests  did  not  Include 
performance  components  nor  were  they  validated  against  external  criteria  of 
job  proficiency.  Indeed,  the  EEC  was  not  staffed  to  handle  performance 
testing  or  field  validation. 

In  1973,  due  largely  to  the  Influence  of  the  performance-based  training 
movement,  the  Aruy  changed  Its  approach  to  soldier  evaluation  by  moving 
from  norm-referenced  paper-and-pencll  tests  to  criterion-referenced  per¬ 
formance  tests.  The  new  tests  were  called  Skill  Qualification  Tests  or 
SQT.  Test  results  were  still  to  be  used  for  personnel  management  purposes 
but  the  primary  focus  of  SQT  was  redirected  toward  the  training  and  combat 
readiness  of  Individual  soldiers. 

The  overriding  requirement  of  SQT  was  that  they  be  job  relevant.  Test 
content  was  tied  to  critical  job  tasks  tnat  were  Identified  through  job  and 
task  analysis  and  described  in  the  Soldier's  Manual  given  to  each  soldier. 


Performance  was  tested  by  one  of  three  methods:  (1)  hands-on--a  standard 
performance  test  In  which  the  task  conditions  are  simulated  and  the  soldier 
demonstrates  performance;  (2)  written  knowledge— a  multiple-choice  test 
about  critical  elements  of  task  performance;  and  (3)  performance  certifica¬ 
tion— a  task-based  evaluation  conducted  by  the“commahder_1h  the  actual  job” 
setting.  These  alternatives  were  intended  to  give  the  test  developer 
needed  flexibility  In  accommodating  tasks  with  different  behavioral 
characteristics  and  different  situational  support  requirements. 

Methods  of  Performance  Measurement 

The  evalutlon  of  performance  measurement  In  the  Am\y  gives  rise  to  an  Issue 
that  Is  central  to  the  conduct  of  Task  5  research  and  development  activi¬ 
ties:  appropriate  methods  of  measurement.  Issues  concerning  the  choice  of 
methods  generally  center  around  trade-offs  between  the  cost  and  validity  of 
alternative  approaches.  The  more  precisely  one  specifies  the  performance 
to  be  observed  and  the  conditions  under  which  It  Is  to  be  observed,  the 
higher  the  cost. 

Frederlckson  (1962)  and  Engel  (1970)  offer  simple  taxonomies  of  performance 
evaluation  measures.  Both  tend  to  distinguish  measures  along  two  contlnua 
of  remoteness  or  Indirectness  relative  to  actual  job  performance:  the 
remoteness  of  the  test  behavior  observed  and  the  remoteness  of  the  observer 
or  scorer.  Job  performance  tests  are  generally  viewed  as  the  most  direct 


ft 


«■% 


i 


S 


method  since  they  call  for  application  of  knowledge  and  demonstration  of 
skill  by  eliciting  behaviors  that  are  equivalent,  or  nearly  equivalent,  to 
those  required  In  the  job  setting.  But  the  directness  of  this  method--with 
its  Inherent  relevance,  content  validity  and  fal rness— comes  at  a  price. 
Many  personnel  managers  be!  1  eve  the _benef  1  ts.  .of  performance,  testing  do.  not 
justify  the  demands  on  facilities  and  personnel  (Harris  A  Hackle,  1962)  nor 
the  wear  and  tear  on  equipment  (Angel  1  et  al.,  1964).  Also,  the  level 
of  professional  skill  available  In  the  military  to  develop  and  administer 
performance  tests  has  been  questioned  (Vlneberg  &  Taylor,  1972).  And  yet 
another  shortcoming  of  performance  tests— obvious  but  not  widely 
discussed— Is  that  the  greater  administrative  time  they  require  usually 
restricts  coverage  of  the  job  domain;  one  can  measure  fewer  job  tasks  per 
unit  of  time  than  Is  possible  with  less  direct  measures. 

The  shortcomings  of  performance  tests,  especially  that  of  cost,  have  led  to 
the  widespread  use  of  job  knowledge  tests.  Job  knowledge  tests  consist  of 
questions  about  task  performance,  usually  delivered  In  a  paper-and-pencll 
multiple-choice  format.  They  are  Indirect  measures  to  the  extent  that  the 
behaviors  measured  do  not  constitute  task  performance  but  only  mediate  It. 


Despite  their  evident  econony,  a  question  lingers  concerning  the  degree  to 
which  knowledge  tests  can  adequately  gauge  a  person's  job  performance 


capability— either  In  terms  of  the  range  of  job  behaviors  that  can  be 
validly  represented  by  knowledge  items,  or  In  the  sense  that  knowledge 
testing  In  a  paper^-and-pencll  mode  presumes  at  least  minimal  literacy. 
Shlrkey  (1966),  Urry,  Shlrkey  and  Waldkoetter  (1965),  and  Yellen  (1966) 
found  correlations  between  job  knowledge  test  scores  and  work-sample  cri¬ 
teria  to  be  too  low  to  support  the  use  of  knowledge  tests  alone  to  assess 
Individual  proficiency  In  the  MOS  for  medical  specialist,  supply 
specialist,  cook,  and  truck  vehicle  mechanic.  Similar  results  were  also 
obtained  In  Engel  and  Rehder's  (1970)  study  of  general  vehicle  repairmen, 
and  Foley's  (1974)  review  of  the  research  on  maintenance  performance.  On 
the  other  hand,  knowledge  tests  do  appear  to  have  adequate  validity  for 
jobs  with  minimal  motor-skill  demands  (e.g.,  personnel  specialist)  provided 
that  only  knowledge  actually  required  on  the  job  Is  covered  In  the  test 
(Urry,  Shlrkey  6  Nlcewander,  1965;  Vlneberg  &  Taylor,  1972).  Adequate 
validity  also  was  observed  In  a  more  recent  study  by  Osborn  and  Ford  (1977) 
In  which  the  knowledge  tests  were  evaluated  against  a  hands-on  mastery  cri¬ 
terion  for  low-skill  manual  tasks.  Controlling  for  mental  ability  and 
level  of  task  mastery,  correlations  on  the  order  of  .70  were  found  between 
various  kinds  of  knowledge  tests  and  hands-on  task  performance.  These  high 
correlations,  It  is  Important  to  note,  were  attributable  to  two  factors: 
(1)  the  skilled  aspect  of  the  tasks  tested  consisted  essentially  of  recal¬ 
ling  functions,  not  of  manual  performance,  making  a  knowledge  medium  appro¬ 
priate;  and  (2)  the  knowledge  Items  were  meticulously  tied  to  the  critical 
steps  In  task  performance  through  a  careful  task  analysis. 

Affective  classes  of  behavior,  such  as  motivation  to  perform  a  task,  cun  be 
assessed  by  performance  tests  If  one  uses  unobtrusive  measures  (Osborn  & 


5-6 


Ford,  1977).  But  to  embed  a  task  In  some  simulated  job  context  suffi¬ 
ciently  broad  to  permit  the  task  to  be  performed  voluntarily  requires  time 
and  expense  not  typically  justifiable.  Standardlxatlon  and  scoring  prob¬ 
lems  also  militate  against  attempting  to  test  motivational  behaviors  in 
situ  (e.g.,  Harris  et  al.,  1975). 

Similarly,  time  pressures,  Inadequate  supplies  and  equipment  and  lack  of 
peer  or  supervisory  support  can  all  Influence  performance  of  soldiers  who 
otherwise  know  how  and  want  to  do  the  job  correctly.  An  indirect  measure, 
usually  In  the  form  of  a  rating  by  a  supervisor  or  peer,  Is  therefore  con¬ 
sidered  a  more  feasible  method  of  tapping  the  affective  or  "will  do" 
aspects  of  job  behavior.  Supervisor  or  commander  ratings  typically  do  not 
correlate  highly  with  job  knowledge  or  job  sample  test  performance  (e.g., 
Engel  4  Rehder,  1970;  Vlneberg  &  Taylor,  1972),  but  this  does  not  rule  out 
their  use  for  measuring  aspects  of  performance  not  represented  In  knowledge 
or  hands-on  tests.  Such  ratings  can  be  particularly  useful  when  developed 
In  ways  that  anchor  the  rater's  judgments  to  specific,  relevant  job  behavi¬ 
ors  (e.g.,  Borman,  Ounnette  &  Johnson,  1974;  Borman,  Hough  &  Dunnette, 
1976;  Campbell,  Ounnette,  Arvey  4  Hellervlk,  1973;  Toquam  4  Borman,  1981). 

Job  and  Task  Analysis  In  the  Army 

Procedures  for  systematic  job  and  task  analyses  were  developed  In  the 
1950s,  largely  as  the  result  of  research  conducted  by  the  U.S.  Air  Force 
(e.g..  Miller,  1953;  Van  Cott,  Berkun  4  Purlfoy,  1955;  Chrlstal,  1969, 
1974),  These  procedures  have  been  widely  used  by  the  Army  1i  support  of 
the  system  engineering  of  Individual  training,  as  articulated  In  the 


Instructional  Systems  Development  (ISO)  approach  (e.g.,  TRADOC  Pam  350-30* 

1975).  A  number  of  documents  has  been  produced  that  provide  guidance  In  r- 

\k  J 

applying  these  procedures  to  Army  jobs  and  tasks  (e.g.,  -CON  Reg  350-100-1,  ' 

1972;  CON  Pam  350-11,  1973;  TC  21-5-7,  1977;  TRADOC  Cl r  351-28,  1978;  and  £;■ 

TRADOC  Pam  351-4  (T),  1979). 

. . . -  -----  -  - .  -  -  -•  -  -  -  —  -  -  - p- 


The  Army's  use  of  job  analysis  procedures  has  tended  to  be  training 
-  oriented.  That  Is,  the  Information  provided  has  been  used  largely  to  help 
make  decisions  about  the  need  for  and  content  of  training  given  In  AIT  and 
other  specialized  courses.  While  other  activities  have  also  reflected  and 
benefited  from  the  knowledge  gained  from  task  analytic  work,  (e.g.,  design¬ 
ing  job  aids,  developing  SQT,  constructing  selection  batteries,  preparing 
job-related  handbooks  and  manuals)  the  primary  thrust  for  task  analytic 
Information  has  come  from  the  various  proponent  schools.  And,  In  fact, 
they  have  the  primary  responsibility  for  carrying  out  the  task  analyses  for 
their  own  M0S.  The  training  emphasis  of  task  analytic  work  has  Important 
Implications  for  the  work  to  be  done  In  Task  5. 

» 

The  task  data  collection  procedure  most  favored  by  the  Army  and  other  ser¬ 
vices  Is  the  job  Inventory,  a  standardized  and  self-administered 
checklist.  It  Is  the  method  of  choice  because  Interviews,  observation- 
interviews,  technical  conferences  and  open-ended  questionnaires  have  too 
many  limitations  to  be  useful  In  any  large-scale  data  collection  program 
(Rupe,  1956).  It  Is  the  approach  currently  lying  at  the  heart  of  the  Army 
Occupational  Survey  Program  (A0SP). 


& 


I 


r 

CV 


^  - 


5-8 


The  checklist  contains  Items  describing  a  variety  of  duties  and  tasks 
related  to  a  given  MOS.  These  Items  are  drawn  from  Information  already 
known  about  the  Job,  primarily  from  existing  documentation  and  from  SME. 
(Guidelines  for  constructing  a  job  inventory  have  been  described  by  March 
and  Archer  [1967]  and  are  Included  In  TRADQC  Pam  351-4.)  Soldiers  who  are 
Incumbents  of  the  target  MOS  are  Instructed  to  check  the  duties  and  tasks 
that  they  perform,  and  to  rate  them  on  one  or  more  dimensions  such  as  fre¬ 
quency-  of  performance  and  the  related  amount  of  time  that  they  require  to 
perform.  Research  has  shown  that  Incumbents  are  the  best  source  of  job 
Inventory  data;  their  supervisors  do  not  have  sufficiently  precise  know¬ 
ledge  of  how  duties  and  tasks  differ  In  terms  of  time  spent  or  other 
dimensions  (Madden,  Hazel  &Chr1sta1,  1964). 

A  quantitative  assessment  of  job  activities  can  be  obtained  from  a  statis¬ 
tical  analysis  of  the  checklist  responses  using  the  Comprehensive 
Occupational  Data  Analysis  Program  (CODAP).  CODAP  can  be  used  to  rank- 
order  duties  and  tasks  In  accordance  with  the  percentage  of  soldiers  who 
perform  them  and  the  -elatlve  time  spent  on  each.  This  Information,  when 
combined  with  a  number  of  other  factors.  Is  often  used  to  select  the  criti¬ 
cal  tasks  that  will  be  the  focus  of  training  and  evaluation  activities. 
Survey  Reports  are  prepared  by  the  U.S.  Army  Soldier  Support  Center  that 
summarize  the  results  of  the  surveys  for  each  MOS.  These  reports  include 

valuable  Information  on  the  structure  and  nature  of  the  MOS  by  skill 

level.  A  major  use  that  Is  made  of  these  analyses  is  to  determine 

which  of  the  tasks  comprising  an  MOS  should  be  taught  In  a  formal  school 

setting  (e.g.,  In  AIT  rather  than  on-the-job).  Once  these  tasks  have  been 
selected,  they  are  subjected  to  more  detailed  scrutiny  to  determine  (1)  how 


5-9 


they  are  best  taught,  and  (2)  the  nature  of  the  requirements  they  Impose  on 
the  trainee.  < 

Relevant  Job  and  Task  Analysis  Methods 

Job,  and  task  analysis  (TA)  activities  are  at  the  core  of  work  to  he  done  In 
Task  5.  While  the  development  of  appropriate  performance  measures  Is  the 
primary  product  of  the  task,  the  job  and  task  analysis  activities  to  be 
carried  out  are  the  primary  Input  to  that  product.  Several  Issues  need  to 
be  kept  In  mind  as  the  work  connected  with  these  activities  Is  carried 
out.  These  issues  involve  accuracy,  completeness,  and  appropriateness  of 
job  and  task  analysis  activities. 

in  considering  these  issues,  we  need  always  to  balance  two  factors  that 
tend  to  work  against  each  other  —  the  economy  and  efficiency  of  using 
existing  Army  job  and  task  information  vs.  the  need  to  supplement  such  In¬ 
formation  with  new  Inputs  from  the  project  Itself.  Obviously,  time  and 
cost  issues  must  be  taken  Into  account  In  coming  to  a  proper  balance 
between  conducting  new  task  analyses  and  supplementing  existing  task 
analyses. 


r* 

« 


£ 


Montemerlo  and  Eddowes  (1978)  reviewed  more  than  100  "how-to-do  It"  manuals 
for  job  and  task  analyses  developed  between  1950  and  1976,  many  of  them 
military  In  origin.  Each  of  the  manuals  was  designed  to  procedural Ize  and 
systematize  the  collection  and  analysis  of  job  and  task  Information.  The 
authors  contend  that  such  efforts  have  led  to  an  oversimplification  of  the 
process  so  that  the  Information  obtained  often  does  not  accurately  reflect 


u 


5-10 


the  job  end  tasks  being  analyzed.  Miller  (1973)  expressed  this  concern 
very  well : 

The  space  allowed  (on  the  TA  form)  tyrannizes  the  space 
used,  which  In  turn  tyrannizes  what  Information  will  be 
entered  which  tyrannizes  what  Information  one  will 
think  about  for  entering  Into  the  format. 

i  •  .  \  , 

\  V  ■ 

Such  concerns  have  been  expressed  most  specifically  for  those  jobs  (MOS)  or 
parts  -of-  jobs  (duty  positions)  -that  are  not-  "procedural Izable."  These, 
kinds  of  activities  are  complex  and  interact  synerglstlcal ly  with  other 
tasks  and  sub-tasks  within  or  outside  the  job.  The  danger  1$  that  the  task 
analysis  procedure  Itself  tends  to  convert  the  subtleties  of  these  jobs 
Into  a  set  of  Invarlantly  ordered  steps,  thus  distorting  the  true  nature  of 
the  Job.  Performance  tests  developed  from  such  an  analysis  similarly  would 
distort  the  criterion  measures. 


Another  concern  related  to  accuracy  and  completeness  Is  the  reliance  placed 
on  existing  documentation  and  records.  Prelewlcz  (1977)  reviewed  the  task 
analytic  work  carried  on  at  eight  different  proponent  schools,  and  noted 
the  following  problems: 


(1)  Those  who  do  the  work  are  often  not  adequately  trained. 

(2)  Those  In  supervisory  positions  sometimes  change  the 
results  without  consultation  with  the  analysts. 

(3)  The  analysts  do  not  reflect  the  point  of  view  of  the  job 
Incumbent  and  resist  change. 

(4)  Analysts  think  In  "big  chunks"  rather  than  at  the  "how 
to  do  It  level,"  thereby  missing  Important  Information. 

(5)  Conditions  and  standards  are  not  considered  In  detail  or 
are  given  cursory  treatment.  Standards  are  sometimes 
"made  up"  because  you  need  to  say  something.  "Must 


complete  task  In  six  minutes,"  may  be  a  completely 
Irrelevant  "requirement." 

(6)  Too  heavy  a  reliance  Is  placed  on  the  Inputs  of  the  pro¬ 
ponent  schools,  Including  previous  TA  work  and  documen¬ 
tation,  and  local  (school)  SME. 

(7)  Not  enough  time  Is  spent  with  current  job  Incumbents, 
observing  what  they  actually  do  In  the  field. 

(8) Command  emphaslson  TA  work  is  often  not  adequate  to  al¬ 
low  the  school  to  do  the  job  properly.  Time,  money,  and 
personnel  are  not  available  In  sufficient  amounts. 


By  focusing  TA  activities  on  training  concerns,  analysts  tend  to  look  at 
how  job/tasks  are  carried  out  at  the  novice  or  trainee  level  rather  than  at 
the  level  of  the  highly  skilled  or  professional  performer  (Klein,  1978). 
The  underlying  assumption  Is  that  skilled  performance  Is  simply  unskilled 
performance  done  better.  The  novice  Is  said  to  be  simply  slower,  makes 
more  errors,  and  does  not  attend  to  the  proper  stimuli  or  cues  for 
Initiating  or  terminating  steps  In  the  task.  This  is  the  building  block 
approach  to  ski  1 1 /proficiency  development.  The  counter  argument  notes  that 
the  proficient  person  does  things  differently  than  the  novice  --  that  a 
new  set  of  skills  evolves  out  of  the  earlier  ones  (DeMalo,  et  al.,  1976; 
Knoop  &  Welde,  1973;  Klein,  1976).  Thus,  one  needs  to  capture  and  describe 
the  skilled  performer's  behavior  In  order  to  be  able  to  (1)  develop 
adequate  measures  of  his/her  performance  and  (2)  to  be  able  to  develop 
predictors  of  that  performance. 


This  problem  as  stated  Is  consistent  with  the  findings  of  Rose,  Shettel , 
and  Wheaton  (1981),  In  studying  the  relationship  between  tasks  as  they  are 
described  in  selected  Soldier's  Manuals  (a  product  of  TA),  those  tasks  as 
measured  by  the  SQT,  and  those  same  tasks  as  actually  performed  on  the 


5-12 


job.  In  fact,  73%  of  the  1,223  soldiers  from  whom  data  were  collected 
noted  that  tasks  In  the  Soldier's  Manuals  differed  from  the  way  they  are 
done  on  the  job.  Many  of  the  soldiers  noted  that  this  difference  often  re¬ 
flected  the  way  In  which  the  job  should  be  done  by  a  trainee  while  learning 
and  the  way  It  efficiently  and  effectively  can  be  done  by  a  skilled 
practitioner. 


It  seems  clear  that  reliance  on  existing  tasks  and  job  Information  must  be 
tempered  with  a  number  of  cautions.  We  need  to  ensure  that  we  have 
current,  complete,  accurate,  and  relevant  Information  that  serves  the  needs 


of  both  the  developers  of  criterion  performance  measures  and  of  predictor 
batteries.  This  means  that  existing  task  analytic  Information  needs  to  be 
verified  Independently  by  SME  from  schools  and  field  units  and  by  comparing 
information  contained  In  different  Army  documents. 


Different  job  and  task  analysis  methods  are  best  suited  for  different 
things.  An  analysis  of  such  alternatives  was  carried  out  In  1974  by 
Brumback,  Romashko,  Hahn,  and  Fleishman.  Five  job  analysis  methods  were 
evaluated  on  the  basis  of  13  criteria.  These  methods  Included  the  job 
Inventory  approach  used  by  the  military  (Chrlstal,  1969),  the  U.S. 
Department  of  Labor  Functional  Job  Analysis  approach  (Fine,  1955),  the 
Position  Analysis  Questionnaire  ( PAQ)  of  McCormick  (1972),  the  Fleishman 
Abilities  Analysis  approach  (Fleishman,  1972),  and  the  Critical  Incident 
Technique  (Flanagan,  1954).  The  conclusion  drawn  by  the  authors  is  that  no 
method  is  uniformly  the  best.  Each  is  weak  in  at  least  one  respect 
compared  to  the  others. 


In  any  complex  job  setting  a  multi -method  approach  therefore  Is  Indicated 
In  which  requirements  are  determined  both  quantitatively  (using  existing 
Army  job  Inventory  procedures)  and  more  qualitatively  (by  applying  one  or 
more  of  the  more  judgmental  approaches).  This  suggests  an  eclectic 
approach  that  allows  borrowing  from  different  job  analysis  methods  those 
parts  that  best  serve  specific  project  needs.  Thus,-  we  will,  for  example, 
use  the  critical  Incident  technique  to  develop  criterion  rating  scales  but 
use  the  AOSP  job  Inventory  approach  to  identify  Important  tasks,  etc.  In 
this  way,  we  believe  that  we  can  compensate  for  the  concerns  relating  to 
the  accuracy,  completeness  and  appropriateness  of  the  task  and  job  Informa¬ 
tion  obtained  In  Task  5,  while  at  the  same  time  remaining  within 
established  time  and  cost  parameters. 

It  seems  that  different  methods  of  performance  measurement  have  different 
advantages  and  disadvantages.  Oespite  their  cost,  hands-on  performance 
tests,  correctly  developed  and  administered,  cannot  be  equalled  In  job 
relevance,  fairness,  or  acceptability  to  examinees  (Schmidt  et  al.,  1977); 
nor  Is  there  a  known  substitute  for  a  performance  test  In  measuring 
proficiency  on  tasks  Involving  psychomotor  skill.  Knowledge  tests.  If  used 
for  the  right  kinds  of  job  tasks  and  linked  methodically  to  knowledge-based 
task  elements,  have  wide  applicability,  acceptable  validity  and  are 
exceptionally  efficient.  Performance  ratings  are  the  most  remote  measures, 
but  permit  the  measurement  of  affective  dimensions  that  cannot  be  feasibly 
tapped  by  other  means. 


SPECIFIC  OBJECTIVES 


The  specific  objective  of  Task  5  Is  to  develop  reliable,  valid  and 
economical  measures  of  first  and  second  tour  jot  performance  of  enlisted 
personnel  in  a  sample  of  nine  NOS.  These  measures  will  serve  both  as: 

(1)  Data  collection  Instruments  for  establishing  the 
relationships  among  various  kinds  of  predictors  and 
criterion  measures,  and 

(2)  Prototypes  for  the  development  of  performance  measures 
for  additional  MOS  and/or  MOS  clusters. 

Two  different  kinds  of  performance  measures  wl 1 1  be  developed.  The  first 
will  be  direct  measures  of  task  performance  (e.g.,  the  average  time  It 
takes  a  soldier  to  troubleshoot  and  repair  a  malfunctioning  electrlca1 
component).  For  measures  of  this  kind,  the  incumbents  must  be  evaluated 
under  carefully  structured  and  standardized  conditions.  The  second  kind 
will  consist  of  measures  that  are  based  on  Indirect  evidence  of  performance 
(knowledge  tests  and  ratings  by  supervisors  or  peers). 

Both  kinds  of  measures  are  needed.  Instruments  of  the  second,  cheaper  type 
are  needed  for  operational  use  In  monitoring  performance,  and  for  the 
Army's  continuing  efforts  to  Improve  selection  and  classification  (which 
will  riot  end  with  this  project).  Instruments  of  the  first  type  are  needed 
In  order  to  develop  the  second.  They  also  are  needed  to  calibrate  period¬ 
ically  the  accuracy  of  selected  predictor  instruments.  The  careful 
calculations  of  utility  that  will  be  made  in  t.  s  project  would  be  open  to 
serious  challenge  If  they  were  based  solely  on  less  direct  measures  of 
performance. 


OVERALL  SUMMARY  OF  PROCEDURE 


The  subtasks,  activities  and  milestones  for  accomplishing  Task  5  are  shown 
In  Figure  5-1.  The  work  begins  In  Subtask  1  with  a  review  of  the 
literature  to  Identify  job  analysis  and  performance  measurement 
methodologies  for  use  by  researchers  In  subsequent  analytic  and  development 
phases  of  the  project.  Research  and  staffing  plans  will  also  be  prepared 
In  this  subtask. 

The  job  and  task  analyses  In  Subtask  2  serve  four  purposes.  First,  to 
provide  Task  2  researchers  with  information  about  the  criterion 
constructs  underlying  a  variety  of  jobs  within  and  across  MOS,  we  begin 
with  an  analysis  of  the  MOS  performance  domain.  Second,  to  describe  and 
analyze  the  task  content  of  the  MOS-specIflc  performance  domain,  we  will 
Identify  the  duties  and  tasks  performed  by  soldiers  In  the  MOS  selected  for 
the  research.  Third,  we  will  describe  and  analyze  each  task  for  which  a 
performance  test  will  be  developed.  Finally,  we  will  describe  and  analyze 
the  performance  of  soldiers  on  the  job  in  terms  of  underlying  dimensions 
that  distinguish  superior,  successful,  and  unsuccessful  Incumbents.  The 
group  of  MOS  for  evaluation  will  be  specified  in  the  context  of  this 
subtask. 

The  Inputs  to  Subtask  3  are  the  recommended  performance  measurement 
techniques  from  the  literature  review  In  Subtask  1  and  the  designated  MOS, 
tasks,  task  descriptions,  and  behavioral  analyses  from  Subtask  2.  Subtask 
3  products  are  new  performance  measures  of  task-specific  and  general  job 


5-16 


dimensions  of  MOS  proficiency.  The  measures  and  administrative  support 
materials  for  their  testing  will  be  submitted  for  COR  review. 

The  objective  of  Subtask  4  Is  to  compile  existing  MOS  performance  measures 
and  to  evaluate  their  usefulness.  Given  the  existing  and  the  approved  new 
-measures  we  will  begin  field  testing- (Subtask  5)  to  assess  their  quality. 
The  measures  of  MOS-specIflc  performance  will  be  evaluated  in  terms  of 
psychometric  considerations,  content  and  construct  validity,  and  practical 
utility  and  cost. 

In  Subtask  6,  based  on  the  results  of  the  field  tests,  we  will  assemble 
existing  and  new  performance  measures  into  a  component  set.  In  Subtask  7, 
we  will  use  the  component  set  of  measures  to  test  two  cohorts,  the  first  In 
FY83/84  and  the  second  in  FY86/87.  The  proposed  schedule  enables  us  to 
develop,  field  test,  and  compile  measures  in  time  to  apply  them  to  the 
FY83/84  cohort  before  the  incumbents  finish  their  first  tour.  We  plan  a 
longitudinal  design  in  which  we  will  obtain  performance  measures  on  those 
soldiers  in  the  cohort  who  reenlist  and  are  available  In  vhelr  second 
term.  Finally,  we  will  replicate  the  FY83/84  cohort  data  collection  effort 
by  testing  soldiers  In  the  FY86/87  cohort  In  their  first  and  second  terms. 
Throughout,  Task  5  data  collection  efforts  will  be  coordinated  with  those 
of  Tasks  2,  3,  and  4.  This  will  enhance  the  Integration  of  research  on 
pre-induction  predictors,  training/school  measures,  and  Army-wide 
performance  measures  with  the  job-specific  criterion  data,  and  reduce 
demands  on  Army  resources  as  well. 


5-18 


The  research  activities  of  job  analysis,  task  analysis,  performance 


PROCEDURE 


Subtisk  5,1:  Review  Literature  and  Plan  Research 

The  cornerstone  for  Task  5  work  is  a  complete  and  easy-to-access  literature 
file  that  addresses  state-of-the-art  methods  of  task  analysis  and 
performance  measurement.  Another  prerequisite  Is  a  detailed  plan  for  the 
research,  to  Include  Its  management  and  staffing.  Thus,  In  this  first 
subtask,-  two  activities  are  planned:  to  review  the  relevant  literature  and 
to  prepare  research  and  manangement  plans. 

Activity  6.1.1  Review  relevant  literature.  Drawing  on  the  library 
resources  of  the  three  contractor  organizations  and  using  their  reference 
accessing  capabilities,  documents  will  be  compiled  that  pertain  to:  (a) 
job,  task,  and  behavioral  analysis,  and  (b)  job  performance  measurement. 
Theoretical  and  empirical  work  on  methods  of  analysis  and  measurement  will 
be  reviewed  along  with  the  results  of  pertinent  Arn\y  applications  In  the 
form  of  completed  job  and  task  analyses  and  job-task  performance  tests. 
These  documents  and  data  will  be  reviewed  and  evaluated  for  relevance  to 
the  project.  We  especially  will  seek  techniques  and  methods  that  can 
supplement  existing  Army  Information  and  current  procedures.  These 
documents  and  data  of  Interest  will  be  abstracted  and  catalogued  for  use 
by  the  Task  5  staff  and  the  rest  of  the  project  staff.  The  job  and  task 
analysis  review  will  focus  on  Identifying  methods  that  will  enable  us  to: 
(a)  define  the  Army  job  performance  domain  in  terms  of  constructs  that  can 
qulde  the  selection  of  predictors  in  Task  2;  (b)  partition  job  behavior 


Into  tasks  or  dimensions  of  performance  that  best  represent  that  job;  and 
(c)  detail  the  job  tasks  or  behaviors  In  ways  that  provide  for  the 
methodical  development  of  performance  measures.  The  performance 
measurement  review  will  focus  on  a  comparative  evaluation  of  different 
methods  of  testing  and  rating  job  performance.  The  reviews  will  be 
completed  by  the  sixth  project  month. 

Activity  5.1.2  Prepare  research  plan.  A  draft  research  plan  has  been 
prepared.  The  plan  will  describe  the  major  subtasks  and  activities  that 
will  be  performed,  the  Interrelationships  among  the  activities  both  within 
and  across  the  subtasks  and  with  those  In  other  tasks,  the  schedule  of  task 
accomplishment,  and  the  troop  support  requirements.  The  plan  was  revised 
on  the  basis  of  comments  received  from  Inhouse  reviewers,  the  COR  and  from 
the  Project  A  Advisory  Groups. 

Activity  5.1.3  Prepare  management  plan.  To  support  the  technical  research 
plan,  a  corresponding  plan  for  managing  Its  execution  was  prepared.  The 
plan  allocates  and  budgets  the  staff,  travel,  material,  service  and 
overhead  resources  required  to  complete  each  Task  5  subtask.  The  plan, 
along  with  those  for  the  other  project  tasks,  will  serve  as  Input  to  the 
management  Information  system  through  which  project  costs  and  progress  wil1 
be  monitored.  A  draft  of  this  plan  was  completed  by  the  fifth  project 
month;  the  final  plan  Is  to  be  completed  by  the  end  of  the  seventh  month. 


Submit  5.2:  Plan  and  Conduct  Job  and  Task  Analyses 


The  goal  of  this  subtask  Is  to  establish  the  critical  elements  of  effective 
MOS  specific  job  performance.  This  Information  Is  essential  to  the 
completion  of  Tasks  2,  3,  and  5.  The  specific  objectives  of  this  subtask 
are:  _  _  _ _ _ 


(1)  To  Identify  the  sample  MOS  for  the  overall  project 

research,  and  the  subset  of  9  MOS  for  Task  5  research. 

(2)  To  supply  _  Task  2  researchers  .with  .  Information  about 
criterion  constructs  underlying  performance  of  a  variety 
of  jobs,  both  within  and  across  MOS. 

(3)  To  describe  and  analyze  the  performance  of  soldiers  on 

the  job  In  term?,  of  those  underlying  dimensions  that 

distinguish  among  superior,  successful,  and  unsuccessful 

job  Incumbents. 

(4)  To  describe  and  job-analyze  the  MOS-specIflc  performance 
domain  In  terms  of  its  task  content. 

(5)  To  describe  and  analyze  each  task  that  Is  to  be 

represented  by  one  or  more  job  performance  criterion 
measures  In  terms  of  how  each  task  Is  performed  In  the 
job  setting. 


Subtask  5.2  will  consist  of  seven  major  activities: 

(1)  Selecting  a  sample  of  MOS. 

(2)  Collecting  MOS-specIflc  and  job/task  analytic  Informa¬ 
tion  (MOS  A). 

(3)  Conducting  job  and  task  analyses  on  MOS  A. 

(4)  Completing  task  description  on  MOS  A. 

(5)  Conducting  behavioral  analysis  on  MOS  A. 

(6)  Conducting  job,  task,  and  behavioral  analyses  on  MOS  B. 

(7)  Conducting  job,  task,  and  behavioral  analyses  on  MOS  A' 
and  B‘  (second  tour). 


We  describe  each  of  these  steps  In  turn.  Each  step  Implicitly  contains  an 
Internal  review  phase.  Each  step  will  be  completed  when  its  final  products 
have  been  approved  by  the  appropriate  senior  staff. 


1 


I  v 


!v 


Activity  S.2.1  Cluster  HOS  and  select  the  samole(s)  of  HQS.  A  provisional 
sample  of  19  NOS  has  been  Identified  (see  Table  1).  The  overriding 
criteria  for  the  composition  of  the  HOS  sample  are: 

(1)  That  the  number  of  job  incumbents  Is  large  enough  to 

- produce  reliable  results  from  the  data  analyslsr and -  — 

(2)  That  the  variety  of  job  skills  collectively  found  In  the 
sample  be  reasonably  representative  of  the  Army's  job 
skill  domain. 


To  meet  these  criteria,  MOS  were  selected  on  the  basis  of  (a)  the  number 
and  mix  of  people  to  be  trained  In  the  job,  and  (b)  the  Career  Management 
Field  to  which  the  job  belonged,  and  (c)  the  representativeness  of  the  MOS 


set  of  the  types  of  jobs  required  to  accomplish  the  Army's  mission.  The 
procedure  generally  entailed  selecting  a  variety  of  CMF  within  strata  of 
MOS  density. 

(1)  A  data  table  was  generated  listing  for  each  Army  MOS  the 
number  of  troops  acquired  In  FY81  and  the  number  of  those 
who  are  female.  Black,  or  Hispanic.1  The  CMF  to  which 

an  MOS  belonged  was  also  listed. 

(2)  A  first  pass  was  made  through  this  table  searching  for 
MOS  which  had  at  least  1,000  troops  overall  and  a  mini* 
mum  of  300  women,  300  Blacks,  and  100  Hlspanlcs.  This 
pass  produced  11  MOS  In  eight  CMF.  The  first  eight  MOS 
were  Identified  by  selecting  the  largest  from  each  CMF. 


^-F Y81  accessions  data  were  available.  It  was  assumed  that  those  data 
would  represent  reasonably  well  the  relative  distribution  over  MOS  of 
accessions  In  FY83  and  later. 


4. 

f 


r 

w 


>! 


1 


r) 

if 

Ij 

V 

f  I 


( 

1 


(3)  Next*  the  subgroup  criteria  were  further  relaxed  by  eli¬ 
minating  the  requirement  for  Hispanic  representation. 
This  produced  four  additional  MOS,  but  all  were  in  CMF 
already  present  In  the  Initial  set  of  eight.  On  those 
grounds,  all  four  were  eliminated  from  further 
consideration. 

(4)  Again  the  criteria  were  changed,  this  time  by  eliminat¬ 
ing  the  requirement  for  female  representation  but 
restoring  the  minimum  requirement  for  0.00  Hlspanlcs. 
Against  these  constraints,  el ght.  new. M0S_ surfaced . repre-  _ 
senting  four  new  CMF.  Four  MOS  were  added  to  the  Ini¬ 
tial  set  of  eight  by  retaining  the  largest  In  each  new 
CMF. 

(5)  A  final  change  1n_ criteria, was  made  In  which  the  total 
accessions  constraint  was  reduced  from  1,000  to  500  and 
all  requirements  for  minority  representation  were 
dropped.  An  additional  29  MOS  In  14  CMF  emerged.  Seven 
of  these  14  CMF  were  represented  In  the  set  of  12  MOS 
already  selected.  Of  the  remaining  seven,  one— CMF  98, 
Intelligence— was  dropped  because  It  Is  classified. 
That  left  eight  MOS  In  six  CMF.  The  largest  MOS  in  each 
of  the  six  remaining  CMF  was  chosen.  Increasing  our 
sample  to  18. 


A  further  indirect  Indication  of  the  mix  of  Job  skills  represented  In  the 
sample  Is  In  the  range  of  ASVA8  composites  and  component  subtest  pertinent 
to  each  MOS.  All  subtests  and  all  but  one  (EL)  of  the  nine  composites  were 
represented  In  the  18  MOS  Initially  selected. 


The  extent  to  which  ASVA8  measures  should  be  considered  In  evaluating  the 
MOS  sample  for  the  range  of  job  skills  covered  is  debatable.  On  one  hand, 
It  seems  tautological  to  choose  or  confirm  the  choice  of  criterion  job 
skills  on  the  basis  of  aptitude  measures  previously  validated  against  such 
criteria.  On  the  other  hand,  since  one  of  the  objectives  of  this  project 
Is  to  revalidate  the  present  ASVAB  against  training  achievement  and  job 
performance  measures,  It  seems  reasonable  to  choose  a  sample  of  MOS  that 
gives  ASVA8  a  fair  chance  for  reval Idatlon.  Accordingly,  we  chose  a  19th 
MOS  (27E)  which  represented  the  EL  aptitude  composite. 


The  composition  of  the  sample  was  also  examined  from  the  standpoint  of 
mission  criticality  by  comparing  It  with  a  list  of  42  MOS  Identified  by  the 
Army  as  high  priority  for  mobilization  training. 2  The  42  MOS  represent  17 
CMF,  13  of  which  are  contained  within  our  set  of  19.  Of  the  four  not  In 
our  sample,  two  are  classified  (CNF  96  and  98)  and  two  are  small  (CMF  23 
and  84).  The  six  _CMF_1n_  our_  .sample  not_  In  _the_  mobilization  _  training 
priority  list  generally  represent  jobs  for  which  there  are  civilian 
counterparts,  a  type  of  job  purposely  excluded  from  the  mobilization  list. 


This  Initial  set  of  19  MOS  represent  19  of  the  Army's  30  CMF. 3  It 
Includes  only  5  percent  of  Army  jobs  but  44  percent  of  the  soldiers 
recruited  In  FY81.  Similarly,  of  the  15  percent  women  In  the  1981  cohort, 
44  percent  are  represented  In  the  sample;  of  the  27  percent  Blacks,  44 
percent  are  represented  In  the  sample;  and,  of  the  5  percent  Hispanic,  43 
percent  are  represented.  While  female  and  minority  representation  Is  high 
absolutely,  relatively  It  remains  about  the  same  as  In  the  population.  The 
sample  Is  15  percent  female,  27  percent  Black,  and  5  percent  Hispanic. 

Nine  of  the  19  MOS  were  tentatively  earmarked  for  the  job  specific 
performance  measurement  phase  of  the  project.  These  were  selected,  as  a 
subset,  with  the  same  general  criteria  used  In  Identifying  the  parent  list 
of  19.  Since  the  larger  list  Is  composed  of  five  combat  and  14  non-combat 
MOS,  it  seemed  reasonable  to  see  that  these  categories  were  represented  In 

200CS0PS  (0AM0-00M) ,  OF,  2  Jul  82,  Subject:  TRR  Training  Priorities. 

30f  the  11  CMF  not  represented,  two  are  classified  (CMF  96  and  98),  two 
(CMF  33  and  74)  have  fewer  than  500  FY81  accessions,  and  seven  (CMF  23,  28, 
29,  79,  81,  84,  and  74)  have  fewer  than  300  FY  81  accessions. 


th«  subset  of  nine.  It  was  further  assumed  that,  to  keep  travel  and  field 
performance  measurement  costs  within  bounds,  only  the  largest  MOS  be 
selected.  So  the  three  large  combat  MOS— 116  (Infantryman),  136  (Cannon 
Crewman),  and  19E/K  (Tank  Crewman)— were  first  selected.  Of  the  14 
non-combat  MOS,  eight  are  large  and  have  race  and  gender  subgroups 
substantially  represented.  Since  five  different  ASVA8  composites  are 
represented  among  the  eight,  one  MOS  was  selected  for  each.  Both  64C 
(Motor  Transport  Operator)  and  948  (Food  Service  Specialist)  share  the  OF 
aptitude  composite  arid  are  roughly  the  same  size,  but"  the  former  was  chosen 
because  It  Is  considered  a  priority  MOS  for  mobilization.  The  two  clerical 
(CL)  MOS  differ  neither  In  size  nor  In  their  mobilization  priority  status, 
so  71L  (Administration  Specialist)  was  chosen  over  76Y  (Unit  Supply 
Specialist)  chiefly  because  It  has  more  women.  Both  MOS  with  the  ST 
composite  were  selected,  since  both  have  priority  mobilization  status. 
Thus,  the  nine  MOS  tentatively  designated  for  Task  5  work  are: 

(1)  11B  -  Infantryman 

(2)  13B  -  Cannon  Crewman 

(3)  19E  -  Tank  Crewman 

(4)  05C  -  Radio  TT  Operator 

(5)  63B  -  Vehicle  and  Generator  Mechanic 

(6)  64C  -  Motor  Transport  Operator 

(7)  71L  -  Administration  Specialist 

(8)  916  -  Medical  Care  Specialist 

(9)  95B  -  Military  Police 

An  initial  group  of  four,  highlighted  above,  was  selected  and  designated  as 
Group  A.  While  work  will  begin  on  Group  A,  the  other  MOS  are  subject  to 


further  review.  Lack  of  support  for  CMF  as  a  job  classification  system  Is 
the  main  reason  for  this  tentativeness.  We  have  been  unable  to  document 
the  CMF  structure  as  a  systematically  derived  behavioral  taxonomy  of  Army 
Jobs. 

( 

As  a  check  on  CMF,  we  have  undertaken  a  direct  cluster  analysis  of  M0SA 
Members  of  the  contractor,  research  staff  and  ARI  .  Army  officers— 
approximately  25  in  all— have  been  given  the  task  of  sorting  a  sample  of 
MOS  Into  groups  of  their  choosing  based  on  perceived  similarities  and 
differences  in  job  activities  as  described  in  AR  611-201.  The  sample  of 
111  MOS— which  represents  47%  of  the  population  of  238  Skill  Level  1, 
Active  Army  MOS  with  conventional  ASVAB  entrance  requirements— Includes  the 
84  large  MOS  (300  or  more  new  job  Incumbents  yearly)  plus  an  additional  27 
selected  randomly  but  proportionately  by  CMF.  Data  from  the  sorting  task 
were  clustered  and  the  Initial  results  used  to  check  the  dispersion  of  our 
provisional  sample  of  19  MOS.  On  the  basis  of  these  results  and  guidance 
recel ved ■  f rom  our  Governance  Advisory  Group,  two  MOS  that  had  tentatively 
been  selected  Initially  were  replaced  by  51B  and  27E,  which  are  In  the  same 
CMF  and  Involve  the  same  Aptitude  Area  Composites  as  the  replaced  MOS  (62E 
and  31M). 

The  foregoing  method  of  sampling  provides  MOS  representative  of  the  range 
of  job  skills  In  the  MOS  population  while  large  enough  for  reliable 
estimation  of  Individual  test  validities  and  differential  validity  across 
racial  and  gender  groups.  Yet,  as  stated  In  the  Introduction,  additional 
analyses  of  the  MOS  domain  are  required  to  support  generalization  of 
validities  from  the  sample  of  19  to  the  other  200  plus  MOS.  The  next  step 
In  this  direction  will  be  to  reaffirm  the  representativeness  of  the  19 


through  comprehensive  cluster  analysis  or  factor  analysis  of  the  MOS 
domain.  Gaps  In  the  sample  of  19,  revealed  through  this  comprehensive 
analysis,  can  be  filled  by  adding  the  necessary  MOS  to  those  researched  In 
the  86/87  cohort.  The  procedure  for  this  further  MOS  analytic  work  will, 
as  described  In  the  Introduction,  be  guided  largely  by  the  results  of  the 
pilot  research  underway  presently.  To  allow  time- for -development  of  meas¬ 
ures  for  any  new  MOS,  the  comprehensive  analysis  of  the  MOS  domain  will 
have  to  be  completed  by  the  middle  of  1985. 


Activity  5.2.2  Collect  MOS-specIflc  and  jcb/task  analytic  Information.  We 
will  obtain  job  and  task  information  specific  to  the  selected  MOS. 

This  effort  will  take  place  during  the  two  months  that  precede  the  planning 
of  the  job  and  task  analyses  of  each  wave  of  MOS.  This  will  occur  In: 

(1)  March-Aprll  1983  for  MOS  A 

(2)  July-August  1983  for  MOS  8 

(3)  April -May  1985  for  MOS  A'  and  B*. 

General  Information  on  enlisted  MOS  Is  available  In  the  research  team's 
libraries  or  has  been  obtained  from  Army  sources  such  as  MILPERCEN  and  the 
Soldier  Support  Center-National  Capital  Region.  Some  job  and  task 
descriptive  data  on  the  four  Initial  MOS  (MOS  Group  A)  have  already  been 
obtained  from  the  Army  Occupational  Survey  Center.  Additional  specific 
Information  about  each  MOS  selected  for  performance  measurement  will  be 
obtained  from  at  least  the  following  Army  agencies: 

(1)  Soldier  Support  Center  -  Army  occupational  survey 
reports  and  questionnaires;  anticipated  changes  In  MOS. 


(2)  Army  Troop  Support  Center  -  Latest  versions  of  Soldier's 
Manual ,  SQT  (hands-on  and  written),  duty  position 
Information;  process  for  selecting  which  tasks  to 
Include  In  SM  and  SQT;  anticipated  changes  In  MOS  task 
composition. 


& 


I 

r  "* 


K1, 


£ 


Li 


(3)  Proponency  Coordination  Center  -  Issues,  current  or 
anticipated,  that  will  affect  task  composition  or  duty 
positions  of  MOS,  the  distribution  of  troops  In  units  or 
commands,  or  the  topics  and  tasks  trained  in  MOS 

_ schools,  _  .  _  _ 

(4)  MOS  Proponent  Schools  -  Copies  of  current  hands-on  and 
written  tests ;  task  criticality  ists;  duty  position 
Information;  anticipated  chai  ,es  In  MOS  task 
composition;  relationship  of  ta„ks  trained  and  tasks 
listed  In  SM  and  SQT;  completed  job  and  task  analysis 
worksheets  {TRADOC  Form  550);  Trainer's  Guides. 

(5)  TRADOC  Adjutant  General  -  Educational  Division  -  MOS 
task  Information  from  RcA  baseline  Skills  project. 


Relevant  documents  and  reports  will  be  acquired  and  housed  within  the 
HumRRO  project  library. 


Identify  and  analyze  constructs  and  attributes.  While  Task  2  staff  need  a 
basis  for  tying  their  selection  of  predictors  to  criterion  constructs,  time 
and  resources  do  not  permit  a  comprehensive  front-end  analysis  of  the  Army 
Job  domain.  Data  from  four  sources  can  be  used,  however,  to  provide  a 
timely  set  of  Job-specific  performance  constructs. 


One  source  Is  the  outcome  of  the  MOS  cluster  analysis  described  In  Activity 
5.2.1.  These  results  may  be  used  to  select  a  representative  but  manageable 
number  of  MOS  on  which  to  focus  an  analyses  of  criterion  constructs.  This 
can  be  done,  for  Instance,  by  selecting  the  two  or  three  most 
representatl ve  (highest  factor  loading  or  Index  of  belongingness)  MOS  from 
each  factor  or  cluster  for  detailed  analysis. 


D 

A 


5-29 


The  form  that  this  analysis  takes  will  depend  chiefly  on  the  quality  of 
data  from  a  second  source,  the  Arn\y  Occupational  Survey  Program  (AOSP), 
The  attractiveness  of  the  AOSP's  CODAP  data  Is  that  a  massive  base  of  job 
task  data  for  Army  MOS  can  be  compared,  sorted,  consolidated  or  otherwise 
examined  by  computer.  Assuming  that  CODAP  data  are  available  on  the  sample 
_of  MOSmentloned,  characteristics  of -the  hundreds  of  -job  tasks  In  each  can 
be  analyzed  to  develop  useful  job  performance  constructs.  There  are 
probably  many  ways  to  summarize  the  data.  One  that  we  have  begun  to 
explore  Is  to  group  tasks  on  the  basis  of  action  words.  This  can  be  done 
separately  by  MOS  and  then  consolidated  across  MOS.  Performance  constructs 
stated  In  job  activity  terms  —  “troubleshoots  electronic/mechanical 
systems,"  "fills  out  forms,"  "engages  targets,"  "assembles/dlsassembles 
mechanical  equipment,"  "Identifies  targets,"  "cleans  equipment,"  etc.  — 
may  be  determined  In  this  way.  Such  clusters  of  job  activities,  when 
supplemented  by  dimensions  from  the  behavioral  analysis  described  In  5.2.5, 
should  provide  a  useful  set  of  job-specific  performance  constructs  against 
which  predictor  constructs  may  be  evaluated  in  Task  2  (Subtask  2.4).  For 
delivery  to  Task  2,  each  construct  will  be  named,  defined  briefly, 
clarified  by  examples  and  Identified  as  to  origin. 

Activity  5.2.3  Conduct  job  and  task  analysis  on  MOS  A.  The  relevant 
information  for  each  MOS  will  be  compiled,  reviewed  by  staff,  and  prepared 
for  analysis  beginning  in  February  and  continuing  through  May  1983. 
Research  staffs  of  AIR  and  HumRRO  periodically  will  meet  to  plan  the  job 
and  task  analytic  procedures  that  will  be  followed,  to  exchange  preliminary 
findings  of  the  analyses,  and  to  review  the  final  results.  These  meetings 
will  Insure  consistent  outcomes  across  different  MOS.  We  will  conduct  the 
job  and  task  analyses  starting  May  1983  and  continuing  through  July  1983. 


The  terms  "job  analysis"  and  "task  analysis,"  as  used  here,  refer  to  a 
process  of  compiling  existing  Information  about  each  MOS  (l.e.,  duty 
positions,  tasks,  task  content  and  procedures),  reconciling  differences 
among  various  sources  of  Information,  and  verifying  the  accuracy  and 

validity  of  the  revised  job/task  Information. 

To  make  certain  that  we  have  a  realistic  understanding  of  the  MOS  we  are 
analyzing,  we  will  make  one-  to  two-day  visits  to  nearby  Army  posts.  We 
will  observe  troops  performing  the  most  frequently  performed  and  essential 
tasks  asoclated  with  the  MOS.  Because  of  their  proximity  to  AIR  and 

HumRRO's  offices,  candidate  sites  for  our  visits  Include  Ft.  Knox,  Ft. 
Belvolr,  Ft.  Meade,  and  the  Aberdeen  Proving  Ground.  These  visits  will  not 
require  the  local  command  to  provide  any  substantial  personnel  support.  We 
would  require  only  an  escort  at  each  post  who  could  direct  us  to  the 

appropriate  work  sites  and.  If  necessary,  explain  the  general  nature  of  the 
work  we  would  observe. 

The  Initial  phase  of  the  job/task  analysis  will  be  to  Identify  the  duty 
posltlon(s)  to  work  with  In  the  MOS.  Our  concern  here  Is  two-fold.  We 
want  to  Identify  the  duty  posltlon(s)  with  the  largest  number  of  Incumbents 
In  order  to  Insure  adequate  numbers  of  troops  who  will  comprise  our  testing 
samples.  We  want  to  choose  duty  positions  that  mirror  characteristics 

which  led  to  the  selection  of  the  MOS  In  Activity  5.2.1. 

The  first  step  Is  to  Identify  the  official  and  practical  duty  positions. 
The  official  duty  positions  are  contained  In  AR  611-201.  The  official  duty 


positions  may  be  subdivided  by  practical  factors,  most  likely  equipment. 
For  example,  the  13B  MOS  has  16  official  duty  positions.  In  practice  the 
lead  position,  cannoneer,  Is  divided  further  by  type  of  gun.  Four  sources 
provide  Information  on  practical  duty  positions:  the  Soldier's  Manual 
(SM),  COOAP  surveys,  Trainer's  Guides  and  SHE  at  the  proponent  school  and 
In  units. 

Once  we  have  a  general  map  of  the  MOS  by  duty  positions  at  skill  levels  1 
and  2,  we  must  judge  the  similarity  of  the  positions.  We  would  be  most 
comfortable  from  a  theoretical  perspective  If  we  only  tested  Incumbents  of 
the  lead  duty  position.  But  that  duty  position  may  not  have  enough  Incum- 
ents.  Even  If  there  are  enough  Incumbents,  we  could  get  our  sample  from 
fewer  units  If  the  scope  of  the  job  to  be  evaluated  were  somewhat  broader. 

We  will  base  our  evaluation  of  the  similarity  of  duty  position  primarily  on 
the  tasks  and  duties  performed  In  each  position.  We  will  request  the  Amy 
Occupational  Survey  Center  to  provide  COOAP  survey  data  reports  showing 
tasks  performed  by  duty  position.  These  reports  show  tasks  that  are  common 
across  more  than  one  duty  position,  as  well  as  those  specific  to  a  single 
duty  position.  The  COOAP  task  list  will  be  augmented  by  task  and  duty  pos¬ 
ition  Information  from  the  SM,  the  Trainer's  Guide  and  the  proponent 
school.  The  resulting  task-by-duty  position  list  will  be  submitted  for 
review  by  subject  matter  experts.  The  purpose  of  the  review  Is  to  double¬ 
check  the  task  list  for  possible  recent  changes  In  doctrine  or  practice. 
For  this  purpose,  a  few  knowledgeable  judgments  are  preferred  to  many  marg¬ 
inally  Informed  opinions.  Thus,  one  or  two  NCO  from  the  proponent  school 


who  are  familiar  with  current  MOS  doctrine  and  one  or  two  from  a  FORSCOM 
unit  who  work  dally  with  the  MOS  will  be  asked  to  review  the  list.  As 
shown  In  Tables  5.2.1  -  5. 2. 4,  the  review  Is  expected  to  take  one  day  per 
NCO  at  each  location. 

We  will  select  the  target  duty  positions  primarily  on  the  basis  of  the  num¬ 
ber  of  Incumbents  In  the  lead  duty  position  and  those  very  similar  to  the 
lead  position.  AOSP  provides  the  data  for  the  official  duty  positions.  If 
there  are  also  practical  duty  positions,  we  will  estimate  the  proportions 
of  soldiers  In  each  position.  If  a  homogenous  group  of  duty  positions  pro¬ 
vides  a  suitably  large  sample  of  soldiers,  performance  measurement  will  be 
limited  to  those  soldiers  performing  the  tasks  relevant  to  their  duty 
position.  If  there  Is  not  a  homogenous  grouping,  we  have  two  choices: 

(1)  Track  the  data  collection  so  that  each  soldier  Is  tested 
not  only  on  tasks  common  across  duty  positions  but  also 
on  a  sample  of  the  tasks  that  make  his  duty  position 
distinctive. 

(2)  Test  only  on  the  tasks  that  are  common  across  duty  posi¬ 
tions.  This  alternative  is  less  preferable  because  the 
supervisor  ratings  will  be  based  largely  on  global 
performance  of  the  distinctive  tasks. 

We  will  next  Identify  the  candidate  tasks  for  which  performance  measures 
will  be  developed.  Each  task  will  be  screened  against  these  criteria: 

(1)  sufficient  proportion  of  incumbents  perform  the  task; 

(2)  The  task  Is  not  likely  to  change  or  disappear  In  the 
Immediate  future; 

(3)  The  task  requires  Individual  rather  than  team  profi¬ 
ciency;  and 

(4)  The  task  Is  deemed  critical  or  important. 


If  criticality  data  from  the  proponent  schools  are  not  available  to  augment 
the  basic  COOAP  data,  we  will  secure  them  from  MOS  Incumbents  and  their 
supervisors.  In  general,  prime  candidate  tasks  will  be  those  that  are 
difficult  and  Important  and  performed  by  a  large  proportion  of  Incumbents. 
The  data  will  be  obtained  using  a  modified  Nominal  Group  Technique 
conducted  by  a  member  of  the  project  staff.  SME  will  be  asked  to  discuss 
and  then  Independently  rate  a  set  of  tasks  for  their  criticality /centrality 
to  the  accomplishment  of  the  MOS/duty  position  job.  Fifteen  SME  per  group 
(MOS)  will  be  required,  since  experience  with  the  technique  Indicates  that 
too  limited  a  perspective  Is  represented  by  a  group  much  smaller,  and 
discussion  becomes  unwieldy  with  one  much  larger. 

Project  staff  will  categorize  the  tasks  that  survive  the  four-stage  filter 
according  to  their  functional  content.  We  will  then  select  tasks  randomly 
to  represent  the  proportion  falling  within  each  functional  category.  Our 
best  guess  Is  that  the  output  will  be  a  list  of  about  30  tasks. 

Job  analysis  of  MOS  A  will  be  completed  by  the  10th  project  month. 

Activity  5.2.4  Complete  task  description  of  MOS  A.  The  Intent  of  this 
step  is  to  describe  in  detail  how  each  MOS  task  selected  In  the  previous 
steps  Is  performed.  The  task  descriptions  will  consist  of  the  task 
elements,  task  conditions  and  standards,  and  will  be  developed  from 
Information  in  relevant  MOS  school  lesson  plans,  task  descriptions 
generated  by  RCA  In  the  Baseline  Skills  Project,  SQT  notices  and  tests. 
SM,  Field  Manuals,  and  Technical  Manuals. 

We  will  assess  the  sufficiency  of  each  task  description  against  these 
questions : 


(1)  Does  the  task  statement  describe  observable  and 

measurable  behavior? 

(2)  Is  the  task  the  same  If  conditions  vary? 

(3)  Are  performance  standards  stated? 

(4)  Are  performance  standards  appropriate  to  the  duty 
position  or  skill  level  of  the  soldier? 

(5)  Are  Initiating  stimuli  Identified? 

(6)  Are  concluding  stimuli  Identified? 

(7)  Is  the  use  of  references,  job  aids,  memory  aids,  part  of 
“  the  task? 

(8)  Are  all  task  steps  or  essential  task  elements  listed? 

(9)  Is  the  level  of  descriptions  consistent  and  conclusive? 

The  completed  detailed  task  descriptions  will  be  reviewed  by  SME.  The  SME 
will  be  two  mid-level  NCO  Instructors  with  recent  troop  experience.  As 
mentioned  before,  evaluation  of  task  data  by  an  Informed  few  Is  preferred 
to  the  opinion  of  many  who  are  marginally  Informed.  Two  qualified  SME  who 
can  check  on  one  another's  oversights  and  biases  are  a  manageable  number  to 
work  with.  Thus,  one  project  staff  member  who  Is  familiar  with  our  task 
descriptions  will  meet  with  the  SME  to  review  the  descriptions  task  by 
task.  Where  conflict  exists,  we  will  require  a  compromise  between  the 
SME.  For  this  review  we  will  need  two  SME  for  each  MOS  for  two  days. 
(Tables  5.2.1  -  5.2.4). 

The  output  of  this  step  will  be  an  approved  detailed  description  of  each 
task.  These  task  descriptions  will  be  the  primary  information  used  In 
developing  performance  measures  (Subtask  3).  They  will  be  completed  by  the 
12th  project  month. 


5-35 


Activity  5,2,5  Conduct  behavioral  analyses.  In  addition  to  the  foregoing, 
we  propose  to  conduct  behavioral  analyses  (Borman,  Ounnette  &  Johnson, 
1974;  Borman,  Hough  &  Dunnette,  1976)  of  the  MOS  selected  for 
Investigation.  The  objectives  of  these  analyses  are:  (a)  to  define  In 
comprehensive,  behavioral  terms  the  performance  requirements  of  these  MOS; 
and_  (b)  to  develop  rating  -scales-  that  may  be  used- to- gather-  speclal- 
for-research-only  ratings  to  serve,  In  turn,  as  criterion  performance 
scores  In  the  predictor  validation  research.  The  procedure  to  accomplish 
the  first  objective  Is  described  below.  Development  of  the  rating  scales 
Is  discussed  In  Subtask  3  with  the  other  performance  measures. 

These  job-specific  rating  scales  are  distinguished  from  those  In  Task  4  In 
that  the  latter  are  directed  at  Amy  performance  1r  general.  This 
distinction  is  conceptually  clear;  whether  It  holds  up  In  practice  remains 
to  be  seen.  A  separate  set  of  scales  may  emerge  for  each  MOS,  or  they  may 
tend  to  converge  toward  a  single  set  applicable  to  all  MOS.  But  In  either 
case,  the  Task  5  rating  scale.  In  contrast  to  those  In  Task  4,  are  to  be 
derived  from  behavioral  incidents  specific  to  MOS  job  performance. 

Generate  performance  examples.  As  a  first  step  In  the  behavioral  analysis 
of  a  given  MOS,  we  will  Identify  soldiers  and  their  supervisors  (NCO)  to 
participate  In  a  series  of  one-day  workshops  to  generate  performance 
examples.  Experience  tells  us  that  (a)  we  need  about  1,000  performance 
example  to  be  sure  the  job  performance  domain  has  been  comprehensively 
defined,  and  (b)  we  can  expact  to  get  an  average  of  about  10  usable 
examples  from  each  soldier.  Thus,  a  total  of  100  participants  per  MOS  will 
probably  be  needed.  Participants  should  have  at  least  two  years' 


experience  In  their  MOS  and  should  be  those  most  fluent  In  oral  and  written 
expression.  To  keep  the  groups  to  manageable  size,  we  propose  conducting 
six  such  one-day  workshops  with  about  16  participants  In  each.  (These 
procedure-  and  numbers  are  justified  In  Task  4  —  pp.  4-23  to  4-27.) 

At  each  workshop,  project  staff  “will  describe  briefly  the  purpose  of  the 
research  and  then  train  participants  to  write  behavioral  examples.  Next, 
participants  will  be  asked  to  write  examples  of  effective  and  ineffective 
job  performance  based  on  their  experience  with  Individuals  In  their  own 
MOS.  These  examples  take  the  form  of  short  "stories"  or  vignettes  about 
Individuals  performing  on  the  job.  Soldiers  writing  the  behavioral 
examples  will  be  encouraged  to  attend  to  the  entire  performance  domain  for 
their  NOS  when  thinking  about  examples  to  write. 

Edit  performance  example.  The  next  step  In  the  behavioral  analysis  Is  to 
edit  the  performance  examples  Into  a  common  format  and  to  content  analyze 
them  to  form  preliminary  performance  dimensions.  Once  the  dimensions  are 
developed  and  defined,  we  will  have  them  reviewed  by  the  COR  and  a  small 
number  of  persons  knowledgeable  about  the  MOS.  This  review  will  ensure 
that  the  dimensions  make  sense,  are  worded  properly,  and  exhaust  the  target 
performance  domain. 

Review  performance  examples.  At  this  point,  we  will  administer  by  mall  the 
edited  performance  examples  and  dimensions  to  the  workshop  participants. 
This  step  Is  designed  to  ensure  that  the  performance  scales  are  meaningful 
to  persons  knowledgeable  about  the  target  job.  Specifically,  job  Incum¬ 
bents  and/or  their  supervisors  will  review  edited  behavioral  examples  and 


make  two  judgments  about  each.  First,  they  will  sort  each  example  Into 
one  of  the  dimensions  according  to  Its  content.  Second,  they  will  rate  the 
effectiveness  level  It  reflects  (e.g.,  1  ■  very  Ineffective  to  7  ■  very 
effective).  This  procedure  will  point  up  ambiguities  in  the  dimensional 
system  or  In  Individual  behavioral  examples  If  any  exist.  The  result  of 
this  step  will  .be. a.  set  of  performance  dimensions, .well,  deflned.ln  terms  of 
observable  behavior.  These  data  will  then  be  analyzed  to  develop  the  final 
rating  scales,  as  discussed  In  Subtask  3.  The  procedures  for  conducting 
behavioral  analyses  and  developing  anchored  rating  scales  are  described  In 
more  detail  In  the  Task  4  Research  Plan. 

The  behavioral  analysis  of  MOS  A  will  be  completed  by  the  end  of  13th 
project  month,  the  analysis  for  MOS  B  by  the  18th  month. 

Activity  5.2.6  Conduct  job  and  task  analyses  on  MOS  B.  The  same  essential 
activities  as  described  for  MOS  A  will  be  repeated  for  MOS  B.  We  expect 
MOS  B  will  comprise  five  MOS.  The  job  and  task  analytic  work  will  be 
divided  between  HumRRO  and  AIR  (PORI  will  do  the  behavioral  analyses  under 
Activity  5.2.5).  The  estimated  time  frames  are  (dates  are  FY): 

(1)  Collect  MOS  B-specIflc  Information  -  1  Jul  -  30  Aug  1983 

(2)  Conduct  job  and  task  analyses  on  MOS  8  -  1  Sep  1983  - 
15  Jan  1984 

(3)  Complete  task  description  of  MOS  8-16  Jan  -  21  Feb 
1984 


5-38 


Activity  5,2,7  Conduct  job  and  task  analyses  on  MOS  A*  and  MOS  B*. 

Although  the  analysis  of  second  tour  MOS  groups  A'  and  B'  will  be  conducted 
after  the  work  on  MOS  groups  A  and  B  has  been  substantially  completed, 
there  will  be  considerable  overlapping  of  the  efforts.  First,  the  con¬ 
ceptual  thinking  and  the  practical  lessons  of  conducting  the  analysis  on 
MOS  groups  A  and  B  will  necessarily  affect  the-wqy  that-A'  and -B-  analyses 
are  carried  out.  Second,  the  consideration  of  differences  between  skill 
levels  1  and  2  within  the  selected  MOS  will  be  an  Implicit  part  of  the 
Initial  analysis. 

The  job  and  task  analyses  for  MOS  A*  and  B1  are  scheduled  for  completion 
30  June  1986. 

Support  requirements  for  Subtask  S.2.  Soldier  support  requirements  for 
job-task  and  behavioral  analysis  of  MOS  A,  MOS  B,  MOS  A1  and  MOS  B'  are 
shown  In  Tables  5.2.1  through  5.2.4. 


Table  5.2.1 


Soldier  Support  Requirements  for  MOS  A 
Job-Task  and  Behavloraia  Analyses 


Purpose 

MOS 

Soldiers 
— 5C — 

‘Number 

Days  Per 

Person 

Review  task  distribution 

-rar 

3-5 

3 

1 

across  duty- positions-  _ 

-64C- 

.  3-5 

_3 

_ 1 _  -  - 

71L 

3-5 

3 

1 

958 

3-5 

3 

1 

Assess  criticality  of 

138 

3-5 

15 

1 

tasks  (If  necessary) _  ... 

64C 

.  3-5 

-15- 

_ 1  -  -  -  . - 

71L 

3-5 

15 

1 

958 

3-5 

15 

1 

Review  task  descriptions 

13B 

3-5 

2 

2 

64C 

3-5 

2 

2 

71L 

3-5 

2 

2 

958 

3-5 

2 

2 

Provide  critical  Incldentsb 

138 

2-5 

100 

2 

and  judgments  for  scale 

64C 

2-5 

100 

2 

devel opment 

71L 

2-5 

100 

2 

95B 

2-5 

100 

2 

aSk111  Level  1  and  Skill  Level  2  behavioral  analysis  data  obtained  at  the 
same  time. 

bStl  soldiers  with  two  years  In  service  may  be  substituted  for  some  of  the 
SL2  soldiers. 


Soldier  Support  Requirements  for  MOS  B 
Job-Task  and  Behavioral*  Analyses 


9 


!  v. 

Purpose 

MOS 

Soldiers 
- 5C - 

Number 

Oays  Per 

Person 

f 

Review  task  distribution 
across  duty  positions 

5  Nos 

3-5 

3  per  MOS 

1 

r.  • 

Assess  criticality  of 
tasks  (If  necessary) 

5  MOS 

3-5 

15  per  MOS 

1 

"X 

Review  task  descriptions 

5  MOS 

3-5 

2  per  MOS 

2 

IS 

ii 

Provide  critical  Incldentsb 
and  judgments  for  scale 
devel opment 

5  MOS 

2-5 

100  per  MOS 

2 

aSk1 1 1  Level  1  and  Skill  Level  2  Behavioral  Analysis  data  obtained  at  the 
same  time. 

bSLl  soldiers  with  two  years  In  service  may  be  substituted  for  some  of  the 
SL2  soldiers. 


Table  5.2.3 


Soldier  Support  Requirements  for  MOS  A* 
Job-Task  Analyses 


Soldiers 

Days  Per 

Purpose 

MOS 

'  SI  ' 

Number 

Person 

Review  task  distribution 

3-fe 

3 

1 

across  duty  positions 

64C 

-  -711  - 

3-5 

3-5 

3 

1 

3 

1 

...  .  . 

95B 

3-5  . 

3 

1  . 

Assess  criticality  of 

13B 

3-5 

15 

1 

tasks  (if  necessary) 

64C 

3-5 

15 

1 

71L 

3-5 

15 

1 

95B 

3-5 

15 

1 

Review  task  descriptions 

13B 

3-5 

2 

2 

64C 

3-5 

2 

2 

71L 

3-5 

2 

2 

95B 

3-5 

2 

2 

Table  5.2.4 

Soldier  Support  Requirements  for  MOS  B' 
Job-Task  Analyses 


Purpose 

MOS 

Soldiers 

SI 

Number 

Days  Per 
Person 

Review  task  distribution 
across  duty  positions 

5  MOS 

3-5 

3  per 

MOS 

1 

Assess  criticality  of 
tasks  (If  necessary) 

5  MOS 

3-5 

15  per 

MOS 

1 

Review  task  descriptions 

5  MOS 

3-5 

2  per 

MOS 

2 

p 


c 


P, 


R 


19 

Ifr 


Wrs 

tu 


t* 


"  . 
£ 


5-42 


Subtisk  5.3  Develop  Performance  Measures 

Work  on  this  subtask  will  proceed  from  results  of  Subtasks  5.1,  5.2,  and 
5.4.  Specifically,  the  Inputs  will  be  comprised  of  recommended  performance 
measurement  techniques  from  the  literature  review  In  Subtask  5.1,  the 
review  of  existing  measures  In  Subtask  5.4,  and  the  designated  MOS,  tasks, 
task  descriptions  and  behavioral  analyses  from  Subtask  5.2.  Newly 
developed  performance  measures  (covering  both  task-specific  and  more 
general  job  dimensions)  constitute  the  output  of  this  subtask. 

Activity  5.3.1  Prepare  troop  support  requests.  Requests  for  support  In 
developing  the  $11  measures  will  be  In  the  first  Troop  Support  Request 
(TSR) ,  submitted  by  the  end  of  the  8th  project  month.  Similar  support  for 
developing  the  Batch  B  and  the  second  enlistment  term  measures  will  be 
Included  In  later  TSRs. 


Activity  5.3.2  Prepare  research  and  development  plans  for  MOS  A 
performance  measures.  Planning  will  be  done  and  reported  In  three  phases: 
rationale  for  new  performance  measures,  procedures  for  development,  and 
methods  for  evaluation. 

The  central  question  guiding  performance  measure  development  Is:  Given 
limited  resources  and  access  to  a  soldier  for  some  fixed  length  of  time, 
what  aspects  of  job  behavior  should  be  measured,  by  what  methods,  In  how 
many  replications.  In  order  to  obtain  the  maximum  amount  of  reliable  data 
on  the  quality  and  efficiency  of  methods  for  measuring  job-specific 
criterion  performance?  A  series  of  guidelines  will  be  established  to  match 
measures  with  tasks. 


A  detailed  description  of  the  procedures  to  be  followed  In  developing  each 
proposed  performance  measure  will  be  included,  along  with  examples  of  each 
type.  In  addition,  the  plans  will  suggest  techniques  for  evaluating 
reliability,  validity,  cost  and  usefulness.  The  plans  will  be  submitted  to 
the  ARI  COR  for  evaluation  at  the  end  of  the  ninth  project  month.  Review 
comments  and  recommendations  will  be  followed  by  revisions,  with  final 
plans  ready  for  Implementation  by  the  end  of  the  eleventh  month. 

Activity  5.3.3  Develop  MOS  A  performance  measures.  Three  types  of 
measures  are  planned:  hands-on  performance  tests,  performance-oriented 
knowledge  tests,  and  behavioral ly-based  ratings.  A  fourth  type  of  measure, 
computer-mediated  knowledge  tests,  will  be  developed  to  the  extent 
feasible. 

Hands-on  performance  tests.  Development  work  begins  with  the  task 
descriptive  data  and  proceeds  through  four  steps: 

(1)  Determine  scoring  approach  (process,  product,  or  combination) 

(2)  Develop  process  Items 

(3)  Develop  product  Items 

(4)  Develop  scorer's  testing  Instructions. 

The  completed  test  package,  which  will  consist  of  all  tasks  to  be  tested 
hands-on  In  a  skill  level,  will  be  pilot  tested  with  representatl ve  scorers 
and  soldiers.  The  purpose  of  this  Is  two-fold.  The  first  Is  to  assure 
that  the  test  can  be  administered  as  designed  In  a  field  environment.  The 
second  Is  to  determine  scorer  reliability. 


5-44 


Field  acceptability  will  be  checked  by  selecting  a  representative  unit  that 
contains  the  Incumbents  to  be  tested.  Although  the  testing  will  be  on  a 
relatively  small  scale,  the  ability  of  the  unit  to  support  the  tryout  with 
the  needed  equipment,  test  site  and  scorers  will  give  Indications  of  the 
feasibility  of  the  support  specified. 


Interscorer  reliability  will  be  established  for  each  test  by  using  a  set  of 
four  representative  scorers  who  score  the  performance  of  six 
representative  soldiers.  The  percentage  of  agreement  will  be  calculated  as 
the  number  of  actual  agreements  divided  by  the  number  of  possible 
agreements.  Any  Item  on  which  there  Is  disagreement  among  scorers  will  be 
discussed  and  considered  for  revision.  The  revision  Is  likely  to  take  one 
of  four  forms.  The  conditions  may  be  changed  to  make  a  behavior  more 
observable;  a  scoring  aid  may  be  added  to  facilitate  more  accuracy  In 
measuring  a  product;  the  scoring  Instructions  may  be  expanded  to  clarify 
the  actions  for  the  particular  circumstance;  or  the  Item  may  be  phrased 
more  precisely.  In  addition,  the  scorer  training  materials  will  be  revised 
to  emphasize  the  procedure  to  score  the  Item.  All  tests  that  Include  Items 
that  are  revised  because  of  low  Interrater  agreement  will  be  tried  out 
again  In  another  Interrater  reliability  pilot  test.  If  the  item  cannot  be 
revised  to  produce  an  appropriate  agreement  level,  the  item  will  be  deleted 
from  the  performance  test. 

In  addition  to  the  interrater  reliability  data,  subjective  data  on 
acceptability  and  feasibility  will  be  collected  from  scorers  and 
examinees.  Examinees  will  be  asked  whether  they  think  their  performance  on 
the  tests  was  a  fair  measure  of  their  ability  to  do  the  task  on  the  job. 


Scorers  will  be  asked  If  the  standard  and  tolerance  of  the  tests  are 
consistent  with  their  experiences,  whether  all  necessary  equipment  was 
available,  whether  the  scoring  Instructions  were  clear,  and  whether 
additional  guidance  was  needed  In  response  to  any  unanticipated  Incidents. 

Performance-oriented  knowledge  tests.  Paper-and-pencll  tests  of  job 

knowledge,  when  compared  to  hands-on  tests,  not  only  provide  wider  coverage 
of  the  job  domain  at  less  cost  In  time  and  resources  but  also  can  prove 
acceptably  valid- for  many  job-tasks  If  the  test  questions  are  methodically 
anchored  In  task  procedures.  The  sequence  of  decisions  and  actions  to  be 

followed  In  that  anchoring  hinge  on  the  causes  of  failure  to  perform  the 

task  correctly.  Each  key  behavior  within  the  task  will  be  analyzed 
rationally  by  staff  and  SME  for  potential  causes  of  a  failure: 

(1)  Is  It  because  the  soldier  doesn't  know  WHERE  to  perform? 

(2)  Is  It  because  the  soldier  doesn't  know  WHEN  to  perform  a 

step? 

(3)  Is  It  because  the  soldier  doesn't  know  WHAT  the  end 
result  looks  like? 

(4)  Is  It  because  the  soldier  doesn't  know  HOW  to  execute 
the  behavior? 

For  each  likely  cause  of  error,  project  staff  and  SME  first  will  identify 
the  correct  location,  or  sequence,  or  product,  or  procedure;  then  describe 
It  In  words  or  pictures;  then  frame  a  question;  and,  finally,  select 
real-world  response  alternatives  (dlstractors)  to  complete  the  test  Item. 
The  Important  point  Is  that  by  considering  these  four  questions  about  each 
aspect  of  task  performance,  we  can  pinpoint  both  what  Is  Important  to  ask 
In  a  knowledge  test  of  task  performance,  and  how  to  ask  It.  This  procedure 


helps  prevent  test  questions  that  so  often  are  used  merely  because  they 
are  easy  to  ask. 

It  should  be  noted  that  these  knowledge  tests  differ  In  purpose  and  kind 
from  those  to  be  developed  In  Task  3.  The  latter  are  Intended  chiefly  as 
training  achievement  measures  to  be  administered  before  and  after 
training.  They  are  designed,  moreover,  to  be  comprehensive  In  the  sense  of 
addressing  all  tasks  In  the  Soldier's  Manual,  but  will  do  so  by  testing 
only  a  sample  of  task  elements.  The  Task  5  knowledge  tests.  In  contrast, 
are  designed  as  potential  substitutes  for  hands-on  criterion  measures. 

They  will  be  developed  only  for  a  sample  of  job-tasks  but  will  cover  all 
essential  performance  elements  for  those  tasks.  Correlations  between  Task 
3  and  Task  5  knowledge  tests,  where  common  task  elements  are  measured,  will 
provide  Interesting  data  on  the  two  approaches  to  job  knowledge  testing  as 
well  as  trends  In  performance  from  school  to  the  job. 

Behavlorally-based  rating  scales.  These  scales,  developed  from  the 

behavioral  analyses  described  In  5.2.5,  are  aimed  specifically  at  those 

aspects  of  the  Job  that  are  particularly  resistant  to  measurement  by 
hands-on  or  knowledge  tests,  and  they  are  designed  to  be  free  of  the  rating 
errors  normally  observed  In  conventional  rating  scales.  The  brief 

descriptions  of  performance  (vignettes)  obtained  from  soldiers  will  be 
edited,  and  then  rated  by  soldiers  as  to  the  effectiveness  of  the  behavior 
described.  The  data  are  used  to  prepare  scales  that  pertain  to  dimensions 
of  MOS-specIflc  performance  described  by  soldiers  as  Important.  Points 
along  the  scales  are  Illustrated,  with  the  vignettes,  to  help  raters 
compare  the  behavior  of  the  ratee  with  these  benchmark  behaviors.  Thus, 


5-47 


soldiers  themselves  have  provided  the  data  to  Identify  the  dimensions,  as 
well  as  to  describe,  by  example,  the  various  levels  of  performance.  The 
scales  will  be  tried  out  in  the  field  test  along  with  the  other  measures. 

Computer-mediated  knowledge  tests.  As  a  possible  fourth  method  of  testing, 
we  propose  to  explore  adapting  the  performance-oriented  knowledge -tests  to  — 
a  computer  medium.  If  possible,  these  tests  will  be  developed  for  a  subset 
of  the  tasks  covered  by  the  job  knowledge  tests.  Computer-mediated  tests 
occur  In  other  testing  applications,  but  are  seldom  used  In  job  proficiency 
assessment.  Potential  advantages  of  the  approach  are  numerous  and 
significant.  The  management  of  examinee  response  data  Is  more  efficient 
and  reliable.  Examinee  responses  to  questions  or  test  stimuli  are  recorded 
and  processed  Instantly,  enabling  the  dynamic  management  of  test  sequence 
and  rapid,  reliable  reduction  and  reporting  of  test  results. 

Computer-mediated  testing  will  be  explored  chiefly  from  the  standpoint  of 
Its  feasibility  In  terms  of  relative  cost,  range  of  task  behaviors  accomno- 
dated,  and  usability  of  the  medium  by  soldiers.  If  considered  promising, 
further  development  of  computer-mediated  versions  of  the  performance  tests 
will  proceed  In  four  phases: 

(1)  prepare  test  Items 

(2)  Identify  system  components 

(3)  develop  software 

(4)  pre-test  system 


5-48 


The  majority  of  test  Items  will  consist  of  the  performance-oriented  know¬ 
ledge  Items  adapted  directly  to  the  computer -mediated  format.  Each  job- 
task  selected  for  testing  will  be  reexamined  In  an  effort  to  Identify  any 
tasks  or  task  elements  with  reaction  time  constraints  or  visual  motion  cues 
that  can  be  simulated  effectively  In  a  computer-mediated  format. 


Skill  Level  1  measures  for  the  first  four  MOS  (MOS  A)  will  be  drafted  amd 
submitted  for  approval  by  the  end  of  the  14th  project  month. 


Activity  5.3.4  Plan  and  develop  HOS  B  measures.  Skill  Level  1  measures 
for  the  remaining  five  MOS  (MOS  B)  will  be  developed  following  the  same 
procedure  outlined  for  MOS  A.  These  measures  will  be  drafted  and  submitted 
for  approval  by  the  21st  project  month. 

Activity  5.3.5  Plan  and  develop  MOS  A1  and  B‘  measures.  Measures  of 
second  tour  performance  (Skill  Level  2)  will  be  developed  for  all  nine  MOS 
(MOS  A'  and  B')  and  submitted  for  approval  by  the  end  of  the  48th  project 
month. 

Support  requirements  for  Subtask  5.3.  A  test  developer  and  SME  can  develop 
a  draft  test  (either  hands-on  or  knowledge)  for  a  typical  task  in  about 
four  days  with  the  SME  working  half  time  on  one  test  and  half  time  on 
another.  This  Includes  reviewing  the  task  analysis,  developing  the  hands- 
on  scoresheet  and  scorer  Instructions  (or  a  sufficient  number  of  knowledge 
Items),  and  conducting  tryouts.  Our  plan  Is  to  develop  knowledge  tests  for 
all  30  tasks  In  each  MOS/SL,  and  hands-on  tests  for  half  of  these,  making 
45  tests  In  all.  (These  numbers  are  estimates  arrived  at  by  considering 


5-49 


potential  trade-offs  among  number  of  MOS,  number  of  tasks  per  MOS,  number 
of  measures  per  task,  soldier  support  requirements,  project  design  objec 
tlves,  and  project  resources.)  Two  SHE  days  for  45  tests  Is  a  total  of  90 
SME  days  for  each  of  the  WOS/SL  shown  In  Tables  5.3.1  and  5.3.3.  The  SME 
should  be  at  least  one  Skill  Level  higher  than  that  for  which  the  test  Is 
being  developed.  Four  additional  soldiers,  who  are  similar  tc  the  SME  but 
who  have  not  participated  In  the  development  of  the  tests,  will  be  required 
to  review  the  knowledge  Items  and  to  serve  as  hands-on  test  scorers  In  a 
scorer  reliability  study  for  each  MOS/SL.  Participation  wll r  total  three 
days  per  scorer  In  order  to  cover  five  replications  of  the  15  hands-on 
tests.  A  minimum  of  six  soldiers  will  be  required  per  MOS/SL  for  prelimi¬ 
nary  tryouts  of  the  Instruments.  Each  subgroup  of  six  should  come  from  the 
MOS/SL  being  tested,  but  span  a  range  of  experience  and.  If  possible,  pro¬ 
ficiency.  They  will  be  needed  for  three  days  to  take  the  hands-on  tests 
plus  about  a  third  of  the  knowledge  tests.  A  second  group  of  four  scorers 
and  six  soldiers  will  be  needed  for  two  days  to  try  out  the  revised 
Instruments. 

The  number  and  kinds  of  soldiers  needed  to  support  development  of  MOS  A 
performance  measures  are  shown  below  In  Table  5.3.1.  The  major  Items  of 
equipment  that  we  may  need  access  to  In  order  to  develop  the  MOS  measures 
are  listed  In  Table  5.3.2. 

The  number  and  kinds  of  soldiers  needed  to  support  development  of 
performance  measures  for  MOS  8,  A',  and  B1  are  shown  in  Table  5.3.3. 


Table  5.3.1 


Soldier  Support  Requirements  for  Developing 
HOS  A  Performance  Measures 


Purpose 

Support  test 
devel opment 


Estimated  -Soldiers  Days  Per 

Oate  MOS  Number  Person 


Check  scorer 
reliability 


Jul  83 


Table  5.3.2 

Possible  Equipment  Support  Requirements  for  Developing 
MOS  A  Performance  Measures 


M101A1  or  M109  Cannon 
Direct  fire  telescope 
Panoramic  telescope 
Collimator 


Common  Soldier 
M60  machinegun 
M16A1  rifle 
M203  gredade  launcher 
M18A1  Claymore  mine  Inert 


1/4-ton  truck,  utility, 
M151  series 
Truck  trailer,  5-ton, 
M818  Series 

Semitrailer,  stake  and 
platform,  12-ton, 

M127  series 


Typewlter 


’1/4-ton  truck,  utility, 
M151  series 
FM  radio  set 


5-51 


Soldier  Support  Requi remen 
MOS  B,  A'  and  B'  Perfor 


Purpose  MOS 

.  ~  MOS  B  -  -  -  ---- 

Support  test  development  5  MOS 

Check  scorer  reliability  5  MOS 

MOS  A' 

Support  test  development  13B 

64C 
71 L 
95B 

Check  scorer  reliability  138 

64C 
71 L 
95B 

MOS  B' 

Support  test  development  5  MOS 


Check  scorer  reliability 


5  MOS 


co  ro  to  co 


3 


Subtask  5.4  Review  and  Evaluate  Existing  MQS-Speclflc  Measures 


The  goal  of  this  subtask  Is  to  compile  existing  HOS  A  performance  treasures 
and  to  evaluate  them  with  respect  to  their  utility  as  Indicators  of  job- 
specific  performance.  SQT  for  the  MOS  of  Interest  are  the  most  obvloys- 
example.  These  performance  measures  exist,  and.  If  they  meet  certain 
criteria  of  acceptability,  would  obviate  the  necessity  of  developing  a  new 
test.  Also,  a  “good"  test  developed  In  the  school  setting,  while  not 
appropriate  as  a  job-specific  measure,  might  be  efficiently  adapted  for  use 
as  such. 

The  work  on  this  subtask  can  begin  when  MOS  A  has  been  Identified  and  the 
tasks  selected  for  testing. 

Activity  5.4,1  Compile  existing  measures.  Measures  that  expand  our 
coverage  of  the  criterion  space  for  an  MOS  without  adding  to  testing  time 
are  not  expected  to  be  numerous.  For  a  measure  to  be  useful  In  this 
regard.  It  should  be  both  comprehensive.  In  that  It  covers  a  significant 
sector  of  the  criterion  space,  and  already  In  operational  use  so  that 
scores  are  available  for  the  soldiers  under  Investigation  In  this  project. 
The  two  most  obvious  measures  that  meet  these  standards  are  the  SQT  and  the 
Enlisted  Evaluation  Report  (EER).  We  will  query  MILPERCEN  and  TRADOC 
regarding  other  such  operational  performance  measures  for  the  job 
specialties  In  MOS  A. 


Finding  available  measures  that  can  be  adapted  to  our  purposes  Is  much  more 
likely.  Here  we  are  looking  for  existing  tests  or  rating  Instruments,  per¬ 
taining  to  tasks  or  behaviors  Identified  for  measurement  In  Subtask  5.2, 
which  can  be  used  to  save  development,  time  or  otherwise  enhance  the  set  of 
measures  developed  In  Subtask  5.3.  We  will  screen  two  major  sources: 

1.  Tradoc _ EPHS  Network .  Performance measures  developed 
within  the  TRADOC  Enlisted  Personnel  Management  System, 

chiefly  by  the  Directorates  of_  ..Training  .Developments  _  _ 

(DTD}  within  the  schools,  will  be  compiled  for  review. 

These  are  measures  developed  typically  for  use  In 
center,  school  and  unit  training  evaluations,  but  which 
hold  promise  for  adaptation  to  the  broader  purpose  of 
performance  appraisal. 

2.  AR I -Contractor  Research  Projects.  Since  many  Army  per¬ 
sonnel  research  projects  Involve  development  of  perform¬ 
ance  tests,  ratings  of  performance,  and  other  criterion 
measures,  this  Is  a  potentially  rich  source  of  perform¬ 
ance  measures.  Reports  of  research  conducted  by  and  for 
the  Army  Research  Institute  which  are  relevant  to  the 
target  MOS  will  be  Identified  and  examined  for  useful 
performance  measures.  For  example,  research  of  the  type 
produced  by  Shields,  Hanser,  Williams,  and  Popelka 
(1981)  may  provide  some  measures  related  to  Skill  Level 
2  performance  (although  most  of  their  measures  are 
Army-wide) . 


5-54 


Activity  6.4,2  Evaluate  existing  measures.  Once  Identified,  each  relevant 
performance  measure  Mill  be  evaluated  according  to  Its  Intended  use.  Any 
Intact,  comprehensive,  MOS-wlde  measure  like  an  SQT  will  be  evaluated  In 
two  ways.  First,  we  will  determine  whether  the  measure  can  be  administered 
and  the  scores  made  available  for  the  job  incumbents  of  Interest  and  In  a 
time  frame  consistent  with  other  measurement.  Second,  we  will  decide  If 
the  measure  is  qualitatively  acceptable  and  useful.  This  second  stage  of 
evaluation  In  turn  will  entail  compiling  and  analyzing  two  kinds  of  data. 
First,  we  will  need  .Information  pertaining  to  a  measure's  development 
(whether  the  measure  or  set  of  measures  was  developed  according  to  sound 
practices).  For  example,  we  would  need  to  determine  whether  prescribed 
procedures  for  SQT  development  have  been  followed.  A  second  type  of  data 
to  be  examined  Is  that  resulting  from  operational  administration  of  the 
measure.  Summary  statistics  on  SQT,  for  example,  are  available  from  the 
SQT  Management  Division  (SMD)  of  the  Army  Training  Support  Center  (ATSC). 
These  data  consist  of  detailed  subte3t  Information  and  item  statistics  and 
also  Indicate  the  tests'  overall  difficulty  and  range  of  performance 
produced. 

Evaluation  of  measures  considered  for  adaptation  or  use  with  the  new  set 
will  be  done  similarly,  but  standards  for  accepting  a  measure  will  be  more 
stringent.  Generally,  these  standards  would  require  evidence  of 
development  procedures  consistent  with  those  set  forth  In  Subtask  3,  In 
addition  to  persuasive  data  on  the  measure's  demonstrated  validity  and 
reliability.  Such  rigorous  standards  will  eliminate  all  but  a  few  existing 
job-task  measures  from  outright  adoption;  In  other  cases  we  may  be  able  to 
rework  an  available  measure  Into  a  new  one  suitable  for  tryout  and  field 
testing. 


Subtask  5.5  Plan  and  Implement  a  Field  Test  of  HOS-Speciflc  Performance 
Measures 

The  field  test  will  provide  data  to  assess  existing  and  new  measures  as 
criteria  of  MOS-specIflc  performance.  Elements  of  analysis  Include  psycho¬ 
metric  considerations,  content  coverage,  practical  utility  and  costs. 


Three  field  tests  are  planned.  The  first,  scheduled  for  project  months 
19-21,  Is  to  test  the  SL1  measures  for  the  first  batch  of  four  MOS.  The 
second,  scheduled  for  months  28-30,  Is  to  test  SL1  measures  for  the  remain¬ 
ing  five  MOS.  The  final  field  test,  scheduled  for  months  53-55,  Is  to 
evaluate  SL2  measures  for  all  nine  MOS. 

Activity  5.5.1  Prepare  outlines  of  test  plans.  The  first  activity  In  this 
subtask  will  be  to  Initiate  research  coordination  and  troop  support 
requests.  This  coordination  will  be  effected  by  providing  outlines  of  each 
test  plan.  The  outline  will  spell  out  necessary  administrative  Information 
and  specify  test  objectives.  In  outline  form,  the  plan  communicates  the 
nature  and  objectives  of  the  test  to  scientific  and  military  personnel  who 
are  responsible  for  approving  and  providing  the  troop  support. 

The  outline  will  be  followed  up  by  a  specific  test  design  statement 
describing  the  scientific  research  aspects  of  the  test.  It  will  specify 
the  conditions  under  which  the  performance  measures  must  be  tested,  the 
experimental  design,  the  data  requirements,  the  analyses  planned,  and 
proposed  use  of  findings. 


•jiT* 


•Ti]  •' 


#J»Ti 


requests.  The  requl red 
request  will  be  submitted  to  allow  a  minimum  of  six  months  for  processing 
plus  schedule  constraints  (training  schedules,  holidays,  summer  National 
Guard  training  support,  National  Training  Center  exercises,  etc.). 


The  data  collection  coordinator  will  submit  the  troop  support  requests  In 
accordance  with  the  Master  Project  Plan  and,  at  the  discretion  of  the  COR, 
will  follow  through  with  briefings,  telephone  calls,  and  supplementary 
materials  to  the  Army  managers  responsible  for  the  troops  requested. 


The  troop  support  request  for  the  first  field  test  will  be  submitted  In  the 
8th  month;  for  the  second  field  test,  in  the  20th  month;  and  for  the  third 
field  test,  In  the  44th  month. 


Activity  5.5.3  Prepare  detailed  test  document.  This  document  guides  the 
day-to-day  operations  of  the  field  test.  It  presents  to  the  Army  personnel 
who  support  the  research  the  description  of  their  role  by  time  and  place. 
It  also  contains  the  data  collection  Instruments  and  the  procedures  for 
quality  control  of  the  data  on-site. 


There  are  several  different  audiences  for  this  product.  The  data 
collection  coordinator,  test  site  manager,  Amy  test  control  officers,  and 
COR  use  the  entire  document  to  guide  and  coordinate  the  data  collection 
effort.  Other  users  include  test  control  officers,  representatives  of  the 
supporting  local  units,  hands-on  test  managers,  and  reseach  assistants  at 
the  specific  sites.  To  facilitate  these  uses,  we  will  prepare  local 
editions  tailored  to  each  test  site  and  provide  a  table  of  contents  to 
Identify  the  sections  for  different  users. 

The  test  document  for  MOS  A  will  be  submitted  to  the  COR  for  review  and 
approval  by  the  end  of  the  fifteenth  month. 

Activity  5.5,4  Conduct  field  tests  of  HOS-specIflc  measures.  Implement¬ 
ation  of  the  field  test  comprises  three  phases  of  activity:  advance  pre¬ 
paration  on-site,  execution  of  the  test,  and  closure.  We  presume  that  the 
COR  will  forward  the  test  plans  to  cognizant  Army  agencies  including  those 
that  will  provide  the  troop  support.  We  will  provide  supplementary 
materials  and  coordination  to  facilitate  the  Implementation.  Advance 
preparation  on-site  requires  approximately  three  days  per  test  site  for: 

(1)  briefings  to  the  Commanders  of  the  units  supplying  the 
troops  to  clarify  the  test  objectives,  activities,  and 
requirements, 

(2)  examination  of  the  test  site,  equipment,  supplies  and 
special  requirements  for  the  data  collection  and  set-up 
of  the  hands-on  test  stations, 

(3)  training  of  the  test  administrators  and  scorers,  and 

(4)  a  dry  run  of  the  test  procedures. 


5-58 


Successful  test  Implementation  requires  that  an  officer  of  the  supporting 
unit  be  assigned  as  test  officer  (e.g.,  a  representative  of  the  G3  or  S3 
office)  and  that  a  staff  of  NCO  Implement  the  controls  for  the  flow  of 
troops  through  the  data  collection  procedures.  We  will  review  the  logis¬ 
tics  plan  and  test  schedule  with  the  unit's  administrative  staff  and  we 
wi  1 1  “conduct-  the  training  of  all- civilian  and  military"  scorers  “and  other 
data  personnel.  In  the  training  phase,  a  dry  run  of  the  procedures  will 
follow  the  data  collection  schedule  and  use  the  personnel  and  locations 
designated  for  the  test.  At  the  first  test  site,  the  dry  run  will  evaluate 
the  procedures  as  well  as  train  the  personnel.  The  training  will  focus  on 
the  handling  of  problem  situations,  particularly  those  requiring  remedia¬ 
tion  by  the  scientific  staff. 

Because  of  the  scope  of  the  data  collection  activities  for  Task  5,  this 
task  will  have  a  data  collection  coordinator  who  Is  highly  skilled  In  Army 
field  data  collection.  The  data  collection  coordinator  will  manage  the 
various  field  and  cohort  test  Implementations.  Each  test  site  will  have 
a  test  site  manager  who  supervises  all  of  the  research  at  an  Army  post 
during  a  field  or  cohort  test.  The  test  site  manager  Is  responsible  for 
controlling  the  quality  and  flow  of  the  data  until  delivery  to  the 
longitudinal  research  data  base  manager. 

In  addition  to  the  test  site  manager,  a  project  staff  member,  supported  by 
a  research  assistant,  will  serve  as  the  hands-on  test  manager  for  an  MOS. 
(The  exact  number  of  MOS  tested  per  site  will  depend  on  the  distribution  of 
Incumbents  by  MOS  and  Installation;  once  we  know  the  exact  MOS  to  be 
addressed  In  a  field  test,  we  can  select  the  sites  so  that  travel  and 


personnel  resources  are  consolidated.)  The  hands-on  test  managers  will 
have  sufficient  experience  with  field  data  collection  to  manage  the  Army 
personnel  who  serve  as  hands-on  test  scorers  and  others  who  assist  with 
administration  of  the  research. 

Military  personnel  will  serve  as  hands-on  test  scorers.  -The  hands-on  test- 
scorers  need  to  be  familiar  with  the  MOS  tasks  being  tested.  We  prefer  to 
have  a  cadre  of  NCO  personnel  for  each  MOS  In  *he  research.  However,  if 
designation  of  such  a  cadre  Is  not  possible  we  are  prepared  to  train  ml  1 1  - 
cary  personnel  at  each  test  site  to  score  the  hands-on  performance  tests. 

If  we  must  train  hands-on  test  scorers  at  each  site,  we  propose  to  use  the 
existing  system  of  test  control  personnel  who  administer  the  SQT  systems. 
This  approach  will  minimize  the  preparation  needed  for  some  of  the  research 
procedures  and  will  reduce  the  burden  on  the  Army. 

We  plan  to  conduct  performance  tests  at  several  stations,  to  administer  the 
set  of  measures  for  an  entire  MOS,  and  to  complete  testing  of  30  soldiers 
In  each  MOS  In  2-1/2  days*.  Data  collection  for  five  MOS  (150  Incumbents 
per  MOS)  can  be  accomplished  In  two  weeks  at  one  site,  with  an  additional 
week  for  site  set-up,  training,  and  collection  of  data  from  some  of  the 
group-administered  tests.  Support  requirements  Include  25  NCOs/offlcers 
(an  average  of  5  per  MOS). 


^During  the  first  field  test,  data  will  also  be  collected  using  Army-wide 
scales  and  knowledge  and  prototype  measures  developed  by  Tasks  4  and  3. 
The  combined  administration  time  will  be  two  and  a  half  days  (see  pages 
4-44  and  3-53). 


I  c-; 


We  will  gather  data  other  than  those  directly  pertaining  to  the  job  perfor¬ 
mance  measures.  For  example,  we  will  examine  the  time  and  resources 
required  for  the  various  types  of  tests,  the  burden  on  the  Army  and  on  the 
soldiers,  potential  Invasions  of  privacy,  test  credibility  and  other 
aspects  of  test  acceptability.  We  will  examine  the  relationship  of  the 
higher^  cost,  Individually  administered  performance  tests  to  the  less  costly 
test  types. 


Closure  of  the  field  test  has  two  major  objectives.  First,  we  will  assure 
the  quality  of  the  field  test  data  prior  to  leaving  the  site  by  identifying 
missing  data  points  and  obtaining  the  data  as  indicated.  Second,  we  will 
debrief  supporting  units,  reemphasizing  the  value  of  their  contributions 
and  providing  what  feedback  we  can  on  performance  that  may  be  requested  by 
soldiers  and  commanders.  We  will  probably  return  to  the  same  posts  to 
conduct  the  cohort  tests.  Our  return  will  be  facilitated  by  the  good  will 
of  the  unit  personnel. 


Activity  5.5.5  Analyze  field  test  data  and  report  results.  During  the 


field  tests  we  will  obtain  data  on  a  variety  of  measures  that  tap  different 
aspects  of  MOS-specIflc  performance.  The  data  will  be  obtained  for  samples 
of  approximately  150  soldiers  not  In  the  target  cohort.  The  types  of  data 
that  will  be  available  are  Indicated  below: 


(1)  hands-on  performance  test  scores 

(2)  performance-oriented  knowledge  test  scores 

(3)  overall  rating  of  job  performance 
(supervisor,  peer,  self) 


5-61 


(4)  ratings  on  behavioral ly-anchored  scales 
(supervisor,  peer,  self) 

(5)  performance  measure  acceptability  rating 

(6)  job  experience  data  (recency  and  frequency  of 
task  performance)— obtain  through  administration 
of  a  short  questionnaire 

(7)  measurement  cost  data 

(8)  SQT  scores  (If  available)  _  _  _ _  _ 

(9)  other  (e.g.,  demographic  Information) 

(10)  computer-mediated  knowledge  test  scores. 


We  will  plan  the  data  collection  and  analysis  with  the  advice  and  assist¬ 
ance  of  Task  1  staff  who  will  also  participate  In  the  analyses.  Several 
major  kinds  of  analyses  will  be  conducted  as  discussed  below. 

Clean  data  and  develop  descriptive  statistics.  Statistical  analyses  will 
begin  with  the  data  verification  procedures  described  In  Task  1  and  con¬ 
ducted  by  the  analysts  In  charge  of  the  IRDB.  Although  we  plan  to  check 
data  sheets  for  missing  data  before  the  Instruments  leave  the  test  site, 
some  Instances  of  missing  data  are  bound  to  occur.  These  will  be  rectified 
by  means  of  the  special  PROC  IMPUTE  missing  data  routine.  Once  the  data 
are  "cleaned,"  standard  descriptive  statistics  will  be  computed  for  the 
samples  (by  MOS)  and  subsamples  (e.g.,  by  ethnic  group  and  gender)  for  all 
variables.  These  will  Include  means,  variances,  ranges,  frequencies,  etc. 
Appropriate  transformations  will  be  applied  to  seriously  skewed  or 
otherwise  non-normal  distributions  to  render  the  data  suitable  for  further 
analyses. 


Determine  reliability.  Internal  consistency  approaches  to  assessing 
reliability  are  generally  inappropriate  for  job  proficiency  tests,  since 
job  skills  and  abilities  are  not  homogeneous  but  tend  to  vary  from  one  area 
of  job  performance  to  another.  This  Is  equally  true  for  hands-on  and 
knowledge  tests.  Test-retest  methods  of  estimating  reliability,  are  more 
appropriate,  but  often  difficult  to  Implement  because  of  the  additional 
personnel  and  time  demands  In  retesting.  Two  problems  are  associated  with 
test-retest  approaches  to  performance  test  reliability.  One  is  the 
practical  difficulty  with  extending  the  retest  interval.  It  Is  difficult 
to  get  soldiers  back  to  the  test  site  a  second  time.  This  argues  for 
retesting  the  day  of  the  first  test  rather  than  after  several  days,  despite 
the  fact  that  an  interval  of  days  between  test  administration  is  preferred 
to  one  of  hours.  A  second  problem  is  that  the  examinee  is  changed  by 
taking  a  performance  test.  This  Is  more  so  than  with  other  types  of 
tests.  Seeing  the  results  of  their  actions  In  the  course  of  performing  a 
task  can  provide  cues  for  changing  behavior  on  retesting. 

We  plan  to  use  two  approaches  to  reliability  estimation  In  field  testing 
the  hands-on  tests.  First,  we  Intend  to  obtain  retest  data  by  attempting 
to  get  at  least  half  of  the  tested  soldiers  back  after  an  Interval  of 
several  days.  To  retest  all  soldiers  on  separate  days  would  nearly  double 
the  time  and  resources  planned  for  field  testing;  yet  to  reduce  the  number 
of  tasks  tested  so  there  would  be  time  to  retest  later  the  same  day  would 
result  In  too  few  soldiers  per  task.  So,  we  plan  to  request  that  all 
soldiers  tested  return  several  days  later  for  retesting.  Attrition  will 
probably  reduce  the  original  number  (150)  substantially,  resulting  In 
perhaps  80-100  soldiers  on  whom  retest  data  are  available. 


As  a  second  approach  to  estimating  the  reliability  of  the  performance 
tests,  we  plan  to  explore  ways  In  which  test  performance  can  be  partitioned 
for  evaluation  In  an  analysis-of-varlance  context.  If  assumptions  of 
independence  and  randomization  can  be  met,  factors  such  as  time,  task  type, 
test  station,  scorer,  etc.,  may  be  Identified  or  Introduced  as  variates  In 
order  to  examine  the  general Izablllty  of  test  performance  over  such  sources 
_of  variance  {Cronbach,_et_ al.., _1972). -  -  -  -  ---  • 


A  split-half  technique  will  be  used  to  estimate  the  reliability  of  the 
knowledge  tests.  Items  pertaining  to  each  task  may  be  divided  Into  two 
halves,  scores  for  the  halves  separately  totaled  and  correlated  over 
examinees— the  "stepped-up"  correlation  providing  the  estimate  of  test 
reliability. 

Estimating  the  reliability  of  job  ratings  by  supervisors,  peers,  and 
Incumbents  themselves,  is  somewhat  less  critical.  We  are  less  concerned 
with  the  reliability  of  Individual  raters  than  with  the  reliability  or 
constancy  of  the  ratings  of  a  given  soldier.  Since  we  will  obtain  two  peer 
and  two  supervisor  ratings  for  each  soldier  rated,  agreement  among  raters 
can  be  used  to  estimate  the  reliability  of  soldier  ratings.  For  the 
self-ratings,  It  would  not  be  appropralte  to  routinely  ask  soldiers  to 
repeat  the  self-rating  task.  To  the  extent  that  the  rating  scales  can  be 
paired  (l.e.  each  pair  of  rating  scales  Is  viewed  as  covering  the  same  or 
very  similar  performance  tasks),  we  can  use  the  correlation  between  the 
paired  scales  to  place  a  lower  bound  on  the  reliability  of  the  two  scales. 
Otherwise,  a  lower  bound  on  reliability  will  be  estimated  using  the 


multiple  correlation  of  the  self  ratings  with  the  peer  and  supervisor 
ratings  and  with  the  performance  and  knowledge  measures  themselves. 

The  reliability  of  other  measures  (job  experience  data  and  background 
demographics)  will  be  checked  for  a  sample  of  soldiers  by  obtaining  eaul va¬ 
lent  Information  from  supervisors. _ '.-...I  .  _  ...  _ _ _ _ 

Determine  validity.  We  will  conduct  analyses  of  content,  construct  and 
concurrent  validity.  Content  validation  Is  largely  a  matter  of  making 
certain  that  test  elements  match  task  elements  revealed  by  tie  task 
analysis.  Insuring  content  validity  Is  an  Inherent  part  of  the  test 
development  process.  Thus,  prior  to  the  field  test  we  will,  with  the 
assistance  of  SME,  have  carefully  compared  each  proposed  Item  (including 
performance  standards,  sequence,  test  conditions)  with  the  task  analysis 
data,  to  assure  that  all  items  are  part  of  the  Job  requirements. 

Construct  validity  of  the  job-specific  criterion  measures  will  be  examined 
In  two  ways.  First,  Item,  task  or  dimension  scores— depending  on  the 
instrument— will  be  intercorrelated,  factor  analyzed,  and  the  resulting 
factor  structures  compared  between  criterion  measures.  Second,  where 
measures  of  performance  on  the  same  tasks  were  obtained  on  the  same 
soldiers  by  more  than  one  method,  a  multltralt-multlmethod  analysis  will  be 
performed  In  an  attempt  to  Identify  criterion  constructs  that  are  stable 
across  methods  of  measurement.  This  will  clearly  be  possible  for  those 
tasks  ("traits")  tested  by  hands-on  and  knowledge  methods.  It  is  also 
possible  that  some  existing  measure  of  task  performance  such  as  SQT  scores 
will  be  available  for  field  test  participants  as  an  additional  method  to  be 


5-65 


Introduced  In  the  analysis.  And,  though  more  remote,  It  may  even  be 
possible  to  Include  some  part  of  the  behavioral  ratings  as  still  another 
method,  If  tested  task  performance  can  be  mapped  readily  Into  one  or  more 
dimensions  of  '  n<-  oral  rating  scales. 

The  primary  technique  used  In  the  analysis  of  construct  validity  will  be 

_ the_  estimation,  of  multltralt-multlmethod  parameteVs  through  the  use  of  _ 

LISREL  V  models.  We  will  rely  on  assistance  from  Task  1  staff  who  have 
considerable  experience  In  the  application  of  such  models  and  who  will 
coordinate  their  use  across  tasks.  The  basic  approach  of  these  models  Is 
to  view  each  observed  measure  as  resulting  from  a  combination  of  underlying 
.construct  (trait)  and  method  variables  plus  some  error  variation  (estimated 
by  the  reliabilities).  The  analysis  then  produces  estimates  of  the  rela¬ 
tive  Importance  of  each  underlying  variable  for  each  observed  measure  and 
of  t<ve  overall  fit  of  the  model  (Joreskog  &  Sorbom,  1981). 

The  results  of  the  multltralt-multlmethod  analyses  will  also  address  the 
question  of  the  extent  to  which  the  various  Job  performance  measures  tap 
distinct  versus  equivalent  criterion  dimensions.  As  Task  3  and  4  perform¬ 
ance  measures  will  be  available  for  the  same  soldiers,  the  field  tests  will 
provide  the  first  empirical  opportunity  for  an  investigation  of  the  dimen¬ 
sionality  of  the  criterion  space.  A  considerable  discussion  of  the  rele¬ 
vant  Issues  In  such  an  analysis  may  be  found  In  the  Task  1  research  plan. 
These  analyses  will  attempt  to  determine  the  number  of  different  measures 
required  to  adequately  cover  the  criterion  space  In  the  main  cohort  admini¬ 
strations.  The  models  being  Identified  or  developed  In  Task  1  to  address 
this  Issue  In  the  main  cohort  analyses  will,  to  the  extent  possible,  be 
appllod  here. 


5-66 


Concurrent  validities  will  be  determl ned  from  Intercorrelations  between 
"Indirect"  measures  (l.e.,  knowledge  tests  and  ratings)  and  the  more 
"ultimate"  measures  (l.e.,  hands-on  performance  tests).  Evidence  concern- 
-  ing  the  relationship  of- the  Indirect  treasures  to -the  direct  ones  will  be 
evaluated  so  that  we  can  recommend,  as  appropriate: 

-  (1)  research  use  of  the  dl rect  (more  costly),  measures, - — - 

(2)  operational  Implementation  of  the  Indirect  (less  costly) 
measures,  or 

(3)  a  mix  of  direct  and  Indirect  measures  that  maximizes  the 
cost-benefit  of  the  measurement  system. 

Search  for  bias.  The  first  test  for. differences  will  be  a  comparison  of 
group  mean  scores,  by  ethnic  group,  by  sex,  and  by  ethnic  group  and  sex 
combined.  Analysis  of  variance  will  be  the  statistical  test  of  choice. 
If  significant  differences  are  obtained,  we  must  ascertain  whether  they 
appear  to  be  due  to  bias  (see  the  Task  1  extensive  discussion  of  the  bias 
Issue).  If  there  Is  a  suspicion  of  bias,  we  will  examine  the  measures  to 
determine  whether  changes  can  be  made  that  would  reduce  or  eliminate  It. 
Possible  changes  Include  simplifying  Instructions  or  options  In  written 
tests.  If  language  appears  to  be  the  problem;  providing  special  tools  or 
mechanical  assists  on  a  performance  test  that  examinees  report  using  even 
If  not  required  by  the  technical  manual,  etc.  Finally,  It  may  be  necessary 
to  eliminate  some  Items  from  the  scoring  system  If  no  other  way  can  be 
found  to  equalize  apparently  biased  scores. 


Other  analyses.  All  of  the  analyses  described  above,  when  applied  to 
task-level  performance  data,  assume  that  the  performance  scores  are  valid 


Indicators.  That  Is,  low  proficiency  Is  Indicative  of  low  MOS-specIflc  job 
performance.  Clearly,  therefore.  If  an  Incumbent  does  not  perform  some  of 
the  target  tasks  frequently  (or  has  not  done  so  recently)  the  meaning  of  a 
low  test  score  Is  ambiguous.  To  assess  the  degree  to  which  such  an  arti¬ 
fact  may  underlie  the  obtained  performance  data  (and  affect  reliability, 
validity,  and  bias),  we  Intend  to  replicate  all  of  the  statistics  and 
analyses  discussed  above,  using  task-level.  performance  .data.  that  have  been 
adjusted  by  means  of  appropriate  covariate  procedures,  to  control  for 
recency  and  frequency  of  task  performance. 

Prepare  report.  We  will  report  the  results  of  the  field  test  of  new  and 
existing  MOS-specIflc  measures  three  months  after  the  completion  of  the 
field  test,  and  will  revise  It  based  on  comments  from  the  COR.  The  report 
will  have  a  management  section  that  summarizes  the  types  of  information  of 
use  to  Army  personnel  managers  and  will  have  a  scientific  section  In  a 
format  suitable  for  ARI  publication  or  for  submission  to  a  psychological 
journal , 

t 

Support  requirements  for  Subtask  5.5.  The  number  of  soldiers  per  MOS/Sl 

i 

(N  ■  150)  being  requested  for  the  field  test  Is  considered,  in  a  statisti¬ 
cal  sense,  minimally  acceptable.  With  a  sample  of  150,  correlations— which 
will  be  computed  among  test  methods,  subtests,  Items,  rating  scales,  job 
experience  data  and  other  varlables—of  .14  or  larger  will  test  as  signifi¬ 
cantly  different  from  zero  (1-talled  test  at  .05  level)  using  standard 
statistical  tests  for  the  significance  of  sample  correlation  coefficients. 
Similarly,  this  sample  size  will  enable  us  to  be  95  percent  confident  that 
an  estimate  of  the  proportion  passing  a  measure  Is  accurate  within  plus  or 


fr-  i 


: 


nr 


£ 


f- 

je 

S 

F; 


« 


5-68 


minus  .08.  An  N,  of  150  will  also  enable  us  to  detect  unanticipated, 
Infrequent  events  or  problems  that  may  occur  In  connection  with  field 
testing  Instruments.  A  problem  that  occurs  for  only  one  soldier  In  50, 
which  could  seriously  affect  the  larger  cohort  administration,  will  have  a 
95  percent  chance  of  being  detected  (the  probability  of  the  event  not 
occurring,  .98,  raised  to  the  power  of  150  Is  just  under  .05,  the 
probability  of  the  event  never  occurlng  In  our  sample). 

Soldier,  support  requirements  for  _  the  field  test  of.  MOS.  A,  MOS  B,  MOS  A', 
and  MOS  8'  are  shown  In  Tables  5.5.1  through  5.5.4. 


5-69 


Table  5.5.1 

Soldier  Support  Requirements  for  MOS  A  Field  Testa 


Soldiers  Jays  Per 

Purpose  MOS  51  Number  Person 

Provide  liaison  with  Officer  1  per  post  T2 

tested  units 


Coordinate  equipment  NA 

and  subjects 

Score  hands-on  tests  13B 

64C 

71L" 

95B 

Subjects  for  measuresb  138 

64C 
71 L 
95B 

Supervisor  ratlngsc  13B 

64C 

71L 

95B 


3-5 

2  per  post 

12 

2 

3  per  post 

12 

2 

3  per  post 

12 

2 

3  per  post 

12 

2 

3  per  post 

12 

1 

150 

2.5 

1 

150 

2.5 

1 

150 

2.5 

1 

150 

2.5 

3-5 

40 

1 

3-5 

40 

1 

3-5 

40 

1 

3-5 

40 

1 

aIt  Is  unlikely  we  can  get  150  In  each  MOS  at  any  post.  An  average  of  5  support 
personnel  per  MOS  per  post  will  be  required.  The  most  efficient  approach  would 
be  to  field  test  each  MOS  completely  at  any  given  post. 

^he  2.5  days  Includes  time  for  the  administration  of  Task  3  and  4  measures. 


Supervisor  of  tested  soldiers.  This  requirement  Is  described  In  the  Task  4 
plan  (see  page  4-44). 


Table  5.5.2 


Soldier  Support  Requirements  for  MOS  B  Field  Testa 


Purpose 


aison  w 
tested  units 

Coordinate  equipment 
and  subjects 

Score  hands-on  tests 


Subjects  for  measuresb 
Supervisor  ratlngsc 


MOS 

Number 

Officer 

1  per  post 

NA  " 

3-5 

2  per  post 

5  MOS 

2 

3  per  post 
per  MOS 

5  MOS 

1 

150  per  MOS 

5  MOS 

3-5 

40  per  MOS 

ays  Per 
Person 


aIt  Is  unlikely  we  can  get  150  In  each  MOS  at  any  post.  An  average  of  5  support 
(ry  personnel  per  MOS  per  post  will  be  required.  The  most  efficient  approach  would 
^  be  to  field  test  each  MOS  completely  at  any  given  post. 

jj£  ^The  2.5  days  includes  time  for  the  administration  of  Task  3  and  4  measures. 

Supervisor  of  tested  soldiers.  This  requirement  Is  described  In  the  Task  4 
p;  plan  (see  page  4-44). 


Table  5.S.3 


Soldier  Support  Requirements  for  MOS  A'  Field  Test 


Purpose 

MOS 

soldiers 
— 5l — 

Number 

days  Pe 
Person 

Provide  liaison  with 
tested  units 

Officer 

i  per  post 

12 

--  -  -  -  - -  -  -  - -  - 

_  .  . 

. . .  . . . 

-  - - 

-  -  - - 

Coordinate  equipment 
and  subjects 

NA 

3-5 

2  per  post 

12 

Score  hands-on  tests 

138 

3 

2  per  post 

12 

- - ------ - -  -  . 

64C  — 

-  3  - 

2  per  post 

- 12  - 

71L 

3 

2  per  post 

12 

958 

3 

2  per  post 

12 

Subjects  for  measures 

138 

2 

150 

2 

64C 

2 

150 

2 

7 1L 

2 

150 

2 

95B 

2 

150 

2 

Supervisor  ratings* 

138 

4-5 

20 

.5 

64C 

4-5 

20 

.5 

71L 

4-5 

20 

.5 

958 

4-5 

20 

.5 

Supervisor  of  tested  soldiers. 


5-72 


Soldier  Support  Requirements  for  MOS  B'  Field  Test 


Purpose 

MOS 

- - - - - T-, 

Number 

Provide  liaison  with 
tested  units 

Officer 

l  per  post 

Coordinate  equipment 
and  subjects 

NA 

3-5 

2  per  post 

Score  hands-on  tests 

5  MOS 

3 

2  per  post 
per  MOS 

Subjects  for 
measures 

5  MOS 

2 

150  per  MOS 

Supervisor  ratlngsa 

5  MOS 

4-5 

20  per  MOS 

Subtask  5,6;  Assemble  Old  and  New  Performance  Measures  Into  Compos Ite  Sets 


This  subtask  has  two  purposes: 

(1)  To  establish  a  data  base  that  permits  performance 
measures  previously  developed  In  the  project  to  be 
specified  as  measures  for  the  same  or  similar  tasks 
performed  In  other  MOS. 

(2)  To  determine  the  measure(s)  for  testing  each  task  In  the 

cohort  data  collections.  . .  _  . 


Activity  5.6.1  Develop  a  data  base  "for  comparing  tasks.  One  of  the 
existing  weaknesses  of  the  current  AOSP  Is  that  task  analyses  are  conducted 
at  different  sites  by  different  people  with  the  consequence  that  Inconsist¬ 
ent  terminology  Is  fairly  common.  We  will  Identify  commonalities  by  struc¬ 
turing  a  matrix  that  arrays  tasks  against  measurement  techniques.  The  cell 
entries  will  consist  of  new  and/or  existing  performance  measures.  The 
total  matrix  will  be  capable  of  being  stored  in  the  computer.  Hard-copy  of 
relevant  parts  of  the  matrix  would  be  available. 


This  activity  will  consist  of  three  steps: 

(1)  Collect  relevant  information  and  data  concerning  new. 
existing,  and  recommended  performance  measures  for  each 
task. 

(2)  Develop  a  consistent  descriptive  system  or  terminology 
to  describe  these  measures,  and  their  characteristics 
In  a  form  suitable  for  Army-wide  use. 

(3)  Organize  the  collected  Information  In  a  data  base  In 
accordance  with  the  above  system,  Including  appropriate 
cross-referencing  and  categorization  of  Information/ 
data. 


Activity  5.6,2  Select  test  methods  for  tasks.  The  decisions  In  this  acti¬ 


vity  center  on  the  question:  Which  tasks  should  be  tested  by  what  method 
or  measure  during  the  cohort  test  phases? 

The  problem  Is  to  select  test  methods  so  the  total  pool  of  tests  has  the 
highest  concurrent  validity  and  acceptability  consistent  with  feasibility 
constraints.  The  data  and  experiences  resulting  from  the  field  tryouts 
will  guide  the  selection  of  test  method  for  each  task  that  Is  to  be  tested 
during  the  cohort  phase.  Three  criteria  are  Involved  In  the  decision: 
concurrent  validity,  acceptability,  and  feasibility. 

Concurrent  validity.  The  primary  concern  Is  to  maximize  concurrent  valid¬ 
ity  as  Indicated  by  the  coefficients  found  during  the  field  trials.  In 
practice  this  Is  a  negative  criterion:  Performance  on  the  tasks  with  the 
lowest  correlations  between  the  written/computer  and  the  hands-on  tests  for 
the  tasks  should  be  measured  using  hands-on  tests.  We  expect  that  some 
differences  In  correlations  will  be  traceable  to  physical  or  psychomotor 
skills  required  In  the  hands-on  version  that  are  not  mediated  by  the  type 
of  knowledge  that  Is  covered  In  the  written/computer  versions. 

Acceptability.  The  second,  and  a  secondary,  Indicator  for  hands-on  testing 
Is  the  spread  between  soldiers'  expressed  preference  for  the  hands-on  mode 
as  against  the  written/computer  mode.  If  other  factors  are  equal,  the 
greater  the  spread  between  preference  for  the  hands-on  test  and  for  the 
written/computer  test,  the  more  likely  that  the  task  will  be  tested 
hands-on.  This  criterion  will  help  assure  that  the  total  test  is  perceived 


Feasibility.  The  third  criterion  for  selecting  tasks  for  hands-on  testing 


Is  feasibility.  The  major  consideration  Is  to  require  only  equipment  that 
can  be  made  available  for  the  cohort  test.  A  second  consideration  1$  the 
amount  of  Information  to  be  gained  In  about  one  day  of  testing.  Tasks  with 
repetitive  operations  and  extraordinarily  time  consuming  steps  are  less 
likely  to  be  tested  hands-on  than  their  “richer"  counterparts. 

Subtask  S.7:  Plan  For  a»:d  Administer  MQS-Speclflc  Performance  Measures  To 
Wain  Cohorts 

This  subtask  provides  the  criterion  data  for  the  project.  The  new  and 
improved  measures  developed  In  Subtasks  5.3  and  5.4  and  refined  in  Subtasks 
5.5  and  5.6  will  be  administered  to  soldiers  In  the  target  cohorts.  The 
results  will  be  supplied  to  staff  working  on  different  tasks  and  will  be 
analyzed  in  several  different  ways,  depending  upon  the  particular  task 
requirements.  Thus,  these  data  will  be  used  as  predictor  and/or  criterion 
data  from  the  perspectives  of  Tasks  2,  3,  and  4,  and  as  raw  Input  to 
continued  reliability  and  validity  analyses  for  Task  5. 

Activity  5.7.1  Prepare  main  cohort  troop  support  requests.  As  the  test 
plans  are  formulated  we  will  submit  refined  requests  at  least  six  months 
before  the  troops  are  needed.  The  elements  of  the  troop  support  request 
are  the  same  as  for  the  field  test  described  In  Activity  5.5.2. 


Activity  5.7,2  Prepare  draft  data  collection  plans. 

Test  plan  outlines  and  detailed  test  documents  for  the  administration  of 
the  performance  measures  to  the  main  cohorts  of  enlisted  personnel  will  be 
produced  slmlllar  In  format  to  those  developed  for  the  field  tests, 
(Activity  5.5*2).  One  Important  difference,  however.  Is  that  we  will  use 
the  LRDB  data  on  the  main  cohorts  to  facl 11 tate- our  sampling  of  the  cohort 
personnel  by  determining  the  characteristics  of  the  personnel  In  the 
selected  MOS,  their  location,  and  other  relevant  features.  Use  of  the  LRDB 
to  determine  personnel  locations  will  be  especially  beneficial  for  obtain¬ 
ing  representative  samples  and  for  portions  of  the  research  plan  that 
require  repeated  measures  of  soldiers  as  part  of  a  longitudinal  design 
(e.g.,  In  their  second  tour). 

Activity  5,7.3  Prepare  final  data  collection  plans.  The  final  data  col¬ 
lection  plans  will  Incorporate  comments  received  from  the  COR  on  the  draft 
plans  and  will  reflect  feedback  Information  concerning  the  availability  of 
the  troop  support  requested  earlier  In  the  applicable  TSR. 

Activity  5.7.4  Conduct  main  cohort  data  collection.  Implementation  of  the 
cohort  test  will  be  facilitated  by  our  advance  knowledge  of  the  location 
and  characteristics  of  the  cohort  samples.  However,  attrition,  relocation 
and  reassignment  of  the  soldiers  In  the  cohort  create  problems  of  obtaining 
suitable  sample  sizes,  especially  for  repeated  measures.  We  have  organized 
the  cohort  data  collection  under  the  supervision  of  a  data  collection 
coordinator  for  the  entire  project.  That  manager  will  have  a  stable  cadre 


of  project  personnel  who  serve  as  test  site  managers,  hands-on  test 
managers,  and  research  assistants.  This  organization  of  the  data 
collection  activities  and  personnel  will  decrease  the  Impact  of  the 
research  on  the  participating  units  and  assure  the  standardization  needed 
for  data  quality  control.  We  will  Institute  quality  control  procedures  In 
the  cohort  data  collection  that  are  similar  to  those  discussed  for  the 
field  test.  _ 


The  advance  preparation,  data  collection  procedures,  and  test  site  closure 
In  the  cohort  test  will  be  similar  to  those  In  the  field  tests.  The  Task  5 
measures  applied  will  be  those  assembled  In  Subtask  5.6  as  the  criteria  for 
the  products  of  Tasks  2,  3  and  4. 

We  will  administer  the  MOS  A  and  6  Skill  Level  1  measures  to  FY83/84  first 
term  incumbents  during  the  first  cohort  test.  (We  have  as  a  target  the 
testing  of  650  soldiers  In  each  of  the  19  MOS— 19  for  Task  4  measures,  9 
for  Task  5  measures.)  By  the  time  of  the  second  cohort  test.  In  the  69th 
project  month,  we  will  have  prepared  and  field  tested  the  Skill  Level  2 
performai.ce  measures  for  all  9  MOS  for  which  specific  measures  are  being 
developed.  These  will  be  administered  along  with  Task  4  Army-wide  measures 
to  FY83/84  second  tour  Incumbents  (100  per  MOS)  while  at  the  same  time 
administering  Skill  Level  1  measures  to  FY86/87  first  tour  Incumbents 
(500-550  per  MOS). 

The  final  data  collection  will  be  In  the  second  term  of  the  FY86/87  cohort 
(the  9th  project  year),  when  100  Skill  Level  2  soldiers  In  each  of  the  MOS 
will  be  tested.  The  performance  measurement  and  analyses  will  be 


conducted  by  the  the  same  cadre  of  data  collection  and  analytic  personnel 
who  worked  on  the  data  In  the  preceding  major  administrations  and  analyses. 


Throughout  the  cohort  tests,  the  data  collection  activities  will  be  coor¬ 
dinated  to  integrate  the  research  on  pre-induction  predictors,  training 
measures,  and  Army-wide  performance  measures  with  the  final  criterion 
measures  and  to  reduce  scientific  and  Army  resource  demands.  The  following 
activities  will  be  carried  out  by  the  Task  5  data  collection  team  for  other 
project  tasks  during  cohort  data  collection: 

(1)  Administer,  In  line  with  guidance  and  training  given  by 
Task  2  staff,  the  four-hour  predictor  battery  (second 
cohort  test  administration  only). 

(2)  Administer  the  Task  3  job-knowledge  tests  for  all  19  MOS 
(first  cohort  test  administration  only). 

(3)  Collect,  under  the  supervision  of  Task  4  staff.  Army- 
wide  measures  and  ratings  (all  cohort  administrations), 
as  well  as  utility  judgments  for  NOS  performance  levels. 

Activity  5.7.5  Analyze  main  cohort  data  and  report  results.  We  wl  11 
forward  the  MOS-spedflc  criterion  performance  data  to  the  l.RDB  for  pro¬ 
cessing  and  analysis.  Task  1  staff  will  be  responsible  for  carrying  out 
the  array  of  validity  analyses  that  are  planned.  Task  5  staff  will  also 
conduct  a  variety  of  analyses  using  the  cohort  data,  but  these  analyses 
will  be  developmental  In  nature  and  designed  to  Improve  the  quality  of  the 
performance  tests. 

Many  of  the  analyses  described  for  the  field  test  data  (Activity  5.5.5) 
will  be  conducted  again  on  the  largest  cohort  samples.  The  principal 
purposes  of  these  analyses  will  be  to  reestablish  that  the  hands-on  and  job 


5-79 


knowledge  performance  tests  as  well  as  the  behavioral  and  overall  rating 
scales  are  sound,  useful,  efficient  and  cost-effective  criterion  measures. 
We  will  reestimate  reliabilities,  compute  estimates  of  concurrent  and  con¬ 
struct  validity  for  job  knowledge  tests  end  rating  scales,  make  certain 
that  all  of  the  criterion  Instruments  are  In  compliance  with  APA  Division 
14  guidelines,  and  factor  analyze  the  correlations  among  Instruments  In  a 
continuing  effort  to  refine  the  criterion  battery.  If  the  analyses  Identi¬ 
fy  any  aspects  of  the  measures  that  warrant  Improvement,  the  necessary 
-modifications  will  be  made  after  recelvlng  the  concurrence  of  the  COR. - 

Report  results.  We  will  conclude  the  test  of  each  MOS  group  In  the  main 
cohort  with  a  report  that  sets  forth  the  methods,  samples,  variables. 
Instruments,  and  results  of  the  test.  The  report  will  present  the  com¬ 
ponent  set  of  measures  tested  and  the  results  of  the  analyses  (e.g., 
psychometric  properties,  utility).  Especially  for  MOS  groups  tested  early 
In  the  project,  the  report  will  discuss  the  Implications  for  tests  in 
subsequent  MOS  groups  (e.g.,  use  of  automated  devices  for  data  collection, 
tasks  coimion  to  more  than  one  MOS). 

The  reports  will  contain  management  suimnaries  for  the  portion  of  the 
audience  that  needs  to  make  decisions  based  on  the  validities,  utilities, 
and  other  management  Information.  The  body  of  the  report  will  meet  the 
criteria  for  publication  by  ARI  or  by  psychological  journals.  Including 
details  of  research  design,  theoretical  bases,  results  and  discussions. 

Support  requirements  for  Subtask  5.7.  In  the  cohort  data  collection  we 
are  concerned  with  the  effect  of  sample  size  on  two  major  statistical 


f5*3 


Issues  In  addition  to  those  mentioned  for  the  field  test.  Chief  among 
these  Issues  Is  the  question  of  differences  In  performance  or  validity  for 
subgroups  of  particular  Interest  (women.  Blacks,  and  Hlspanlcs).  With  an 
overall  sample  size  of  650  soldiers  In  each  HOS,  we  expect  to  sample  where 
appropriate  at  least  120  soldiers  from  each  of  the  key  subgroups.  With 
subgroup  samples  of  this  size,  observed  subgroup  differences  of  10  percent 
In  the  proportion  passing  Individual  Items  or  tasks  would  be  statistically 
significant.  Since  differences  of  10  percent  In  the  proportion  passing 
Individual  Items  are  of  practical  significance.  It  Is  essential  that  the 
samples  be  large  enough  to  detect  these  differences  reliably.  In  addition, 
more  accurate  estimates  of  percent  passing  and  correlations  among  measures 
are  required  for  cohort  measurement  than  for  field  testing  the  measures. 
The  proposed  sample  size  will  give  us  95  percent  confidence  bounds  of  plus 
or  minus  4  percentage  points  In  estimates  of  percent  passing,  and  95 
percent  confidence  bounds  of  plus  or  minus  .08  for  estimates  of 
correlations  based  on  the  entire  sample.  While  more  accurate  estimates 
would  clearly  be  desirable,  this  level  of  accuracy  Is  judged  acceptable  for 
the  purposes  of  the  project.  t 


Soldier  support  requirements  for  Implementing  data  collection  on  SL1  MOS- 
speclflc  performance  measures  In  the  FY83/84  cohort  are  shown  In  Table 
5.7.1.  Soldier  support  requirements  for  Implementing  data  collection  on 
MOS-specIflc  performance  measures  In  the  FY86/87  cohort  and  SL2  measures  In 
the  FY83/84  cohort  are  shown  In  Table  5.7.2.  Soldier  support  requirements 
for  implementing  data  collection  on  SL2  MOS-specIflc  performance  measures 
In  the  FY86/87  cohort  are  shown  In  Table  5.7.3. 


tstssjs^ \  i^sni  l:\r-  V •:  Y-  Y^'. 


Table  5.7.1 

Soldier  Support  Requirements  for  Cohort  Test  I 
(Task  5  measures  only) 


■v 


IP 

'V* 


Purpose 

Soldiers 
MOS  Sl- 

Number 

Total 

Days  Per 
Person 

s 

Provide  liaison  with 
tested  units 

_ Officer 

1  per  site 

16  _ 

30  _ 

r>i 

Coordinate  equipment 
and  subjects 

NA  3-5 

3  per  site 

48 

10 

& 

Score  hands-on  tests 

9  MOS  2 

18  per  site 
[2  per  MOS] 

288 

30 

If? 

V  / 

Subjects  for 
performance  measures 

9  MOS  1 

365  per  site 
[42  per  MOS] 

6000 

1 

u 

Supervisor  ratings 

Supervisor  3-5 

100  per  site 

1600 

.5 

6 

( 


!> 


Tabl e  5.7.3 

Soldier  Support  Requirements  for  Cohort  Test  II 
(Task  5  measures  only) 


8 

•a 

ft  . 

jj  Purpose 

MOS 

Soldiers 
— 5C — 

Number 

Total 

Days  Per 
Person 

•f 

$  Provide  liaison  with 

y  tested  units 

f  ’ 

Officer 

1  per  site 

16 

30 

1  Coordinate  equipment 

|  and  subjects 

NA 

3-5 

3  per  site 

48 

10 

Score  hands-on  tests 

*/ 

9  MOS 

3-5 

18  per  site 
[1  per  MOS] 

288 

30 

i  SLl  subjects 

"  performance  measures 

9  MOS 

1 

315  per  site 
[35  per  MOS] 

5000 

1 

SL2  subjects  for 
j'  performance  measures 

i* 

9  MOS 

2 

56  per  site 
[6  per  MOS] 

900 

1 

I  Supervisor  ratings 

Supervl sor 

3-5 

100  per  site 

1600 

.5 

ft 


{?: 

& 

It; 

h 

t 


of  subjects 


fe 


e 


5-82 


b 


assp'j*.*-* 


Table  5.7.3 


Soldier  Support  Requirements  for  Cohort  Test  III 


■B9HH 

Number 

Total 

Days  Per 
Person 

Provide  liaison  with 

Officer 

1  per  site 

15 

tested  units 

III  . 

^  Coordinate  equipment 

NA 

3-5 

3  per  site 

48 

3 

and  subjects 

p. 

Score  hands-on  tests 

9  MOS 

3-5 

9  per  site 

144 

7 

Subjects  for 

9  MOS 

2 

56  per  site 

900 

1 

J?C  performance  measures 

[6  per  MOS] 

p  Supervisor  ratings 

Supervisor 

3-5 

18  per  site 

300 

.5 

of  subjects 


itk  S.8:  Produce  Final  Technical  Report  on  HQS -Specif 1c  Performance 


safes* 

Measures 

Reports  on  MOS-specIf 1c  performance  measures  Include  the  reports  on  the 
field  and  cohort  tests  for  each  MOS  group,  a  report  describing  each  of  the 
Instruments  and  measures,  a  final  technical  report,  and  articles  for 
publication. 

We  will  prepare  the  draft  final  “technical  report  “In  two  parts."  The  execu¬ 
tive  summary  will  present  an  overview  of  the  purpose,  procedures,  results 
and  use  of  the  performance  measures  in  a  brief  text  with  graphic  displays 
suitable  for  managers  and  Arny  operational  personnel.  An  expanded 
scientific  section  will  present  details  of  methodology,  development  of  the 
performance  measures,  technical  problems  and  their  resolution,  statistical 
analyses,  results.  Interpretations,  and  use. 

This  final  report  will  synthesize  the  research  for  all  of  the  MOS  groups. 
It  will  discuss  the  value  and  utility  of  the  two  classes  of  measures  In  the 
project;  "ultimate"  measures  of  task  performance  and  "proximal,"  Indirect, 
low-cost  measures  (e.g.,  pre-induction  predictors,  school /training 
measures).  The  indirect  class  of  measures  will  be  described  as  to  their 
empirical  relationships  with  the  "ultimate"  measures  and  other  variables  In 
the  research  (e.g.,  predictors,  Arny-wlde  measures). 

Task  5  activities  per  se,  including  an  Instrument/measurement  report  and  a 
final  report,  are  scheduled  to  be  completed  at  the  end  of  the  7th  project 
year.  Project  A  staff  will  collect  the  data  and  conduct  the  analysis  for 
the  final  cohort  sample  (FY86/87  cohort,  second  tour).  The  findings  will 
be  Incorporated  Into  an  addendum  to  the  final  report. 


SUMMARY  OF  EXPECTED  OUTCOMES 


The  work  of  Task  5  Is  expected  to  produce  a  number  of  outcomes  that  will 
have  value  beyond  the  basic  purpose  uf  the  task.  Some  of  these  outcomes 
will  be  of  scientific  Interest,  applicable  to  other  researchers  and  other 
situations;  others  will  be  of  practical  value,  primarily  In  the  military 
services  but  to  other  applications  as  well.  Most  of  the  outcomes,  however, 
will  have  both  scientific  and  operational  value,  since  the  scientific 
outcomes  will,  for  "the  most  part,  be  developed  to  solve  an  operational 
problem.  In  the  paragraphs  that  follow,  we  have  separated  the  outcomes 
Into  scientific  and  operational  categories,  though  even  the  operational 
outcomes  would  be  of  Interest  to  other  researchers  faced  with  similar 
problems. 

Scientific  Outcomes 

1.  A  primary  scientific  contribution  will  be  the  taxonomy  of  behaviors 
and  constructs  that  span  the  range  of  Army  MOS,  and  the  techniques 
developed  to  produce  the  taxonomy.  For  the  first  time,  such  a  taxonomy  of 
human  performance  will  be  built  on  large  samples  of  data  from  a  wide  array 
of  Jobs  performed  by  a  sample  of  men  and  women  relatively  homogeneous  In 
age,  training,  and  experience,  but  who  differ  In  aptitudes  and  adaptation 
to  work  and  Army  life.  Future  Army  research  will  not  have  to  rely  on 
meta-analyses  of  old  data  collected  for  different  purposes,  or  on  small 
heterogeneous  samples  collected  under  various  sets  of  Instructions  or 
conditions. 


Tasks  5  and  2  will  be  able  to  Identify  the  similarities  and  dissimilarities 
In  the  job  activities  and  the  KSAO  required  to  perform  those  activities. 
The  longitudinal  aspect  of  the  research  will  reflect  changes  In  the  activi¬ 
ties  and  the  KSAO  as  the  soldier  moves  through  the  enlisted  ranks.  Such  a 
taxonomy  will  be  of  benefit  for  years  to  come  —  to  both  the  military  and 
civilian  workforce,  to  long-range  manpower  planners,  and  to  researchers. 


2.  A  procedure  will  be  developed  for  selecting  tasks  for  testing  that 
addresses  many  of  the  questions -that  have  plagued  job  performance  measure¬ 
ment.  The  procedure  will  be  a  practical  approach  to  selecting  tasks  for 
large-scale  job  proficiency  assessment  efforts  In  both  public  and  private 
sectors.  The  military  services  may  need  to  modify  the  approach  to  achieve 
training  benefits,  but  the  procedure  developed  by  this  project  Is  likely  to 
provide  a  foundation  for  future  measurement  programs. 

One  of  the  goals  of  our  research  Is  to  develop  a  procedure  to  select  tasks 
and  measures  for  the  specific  purpose  of  "generalization  to  other  tasks  and 
MOS."  In  order  to  meet  this  objective,  we  will  use  various  techniques, 
ratings,  categorizations,  and  taxonomies  to  analyze  a  large  number  of  MOS 
and  tasks.  While  we  will  make  use  of  these  analyses  for  the  specific  pur¬ 
poses  of  the  project,  there  are  many  other  potential  uses  for  these 
results. 

For  example,  suppose  we  ultimately  decide  to  develop  front-end  task  analy¬ 
ses  for  large  numbers  of  MOS  and/or  tasks  to  be  used  to  select  MOS/task 


5-86 


for  further  research  or  generalization  to  Army-wide  operations. 
Obviously,  such  analyses  can  be  used  by  appropriate  audiences  In  MOS  other 
than  the  ones  we  select  to  Investigate  In  detail.  More  generally,  our  task 
analysis  procedures  will  be  made  "public"  in  the  sense  that  they  will  be 
user-oriented,  definitions  will  be  carefully  operationalized,  and 
procedures  will  be  standardized  so  that  they  could  be  used  wherever  desired 
in  the  Army. 

In- addition,  the  specific  decision  rules-we  adopt  for-our  purposes  ■—  l.e*,  - 
task  selection  for  representativeness  and  generalization  —  will  have 
direct  application  wherever  and  whenever  similar  decisions  are  made.  For 
example,  two  ubiquitous  problems  for  Army  trainers  are  how  to  select  tasks 
for  inclusion  In  training,  and  how  to  select  tasks  for  readiness 
evaluation.  At  the  core  of  these  problems  Is  the  Identical  issue  of 
representativeness  and  generalization.  A  standardized,  operational 
codification  of  whatever  procedures  and  rules  we  develop  would  be  of  direct 
value. 

3.  An  empirically  based  definition  will  be  developed  of  the  job 
performance  space  In  which  skilled  and  motivational  aspects  of  job  specific 
performance  are  articulated  with  respect  to  training  achievement  and  other 
more  general  (e.g.,  "Army-wide")  Indicators  of  performance. 

Traditionally,  job/task  analysis  has  tended  to  focus  on  relatively  discreet 
relationships  between  training  performance  and  subsequent  job  performance, 
between  Individual  aptitude  and  training,  or  between  aptitude  and 
performance.  Similarly,  "performance"  has  tended  to  be  viewed  either  as 


proficiency  In  certain  ski  11s  or,  alternatively,  as  the  achievement  of  more 
global  goals  related  to  mission  accomplishment  or  other  organizational 
outcomes.  One  of  the  major  outcomes  of  Project  A  should  be  the  Integration 
of  these  elements  Into  a  reasonably  coherent  model. 

This  Is  also  a  matter  of  some  Interest  from  an  operational  standpoint.  In 
large  organizations  such  as  the  Arn\y,  where  Initial  training  Is  largely 
separated  from  day-to-day  operations  In  the  field,  there  is  a  strong 
tendency  for  the.  two -  functions.- to.  become  less  and  less  related  to  each 
other.  This  problem  manifests  Itself  most  visibly  In  the  tendency  of 
operational  personnel  to  discount  or  dismiss  what  Is  taught  In  training  as 
Irrelevant  or  Inadequate.  At  least  part  of  the  reason  for  this  divergence 
is  the  lack  of  a  common  language  to  describe  the  job.  Project  A  presents 
an  opportunity  to  begin  developing  such  a  vocabulary  within  several  MOS, 

language  that  can  be  extended  not  only  to  other  Army  jobs,  but  also  to 

similar  civilian  jobs, 

4.  Evidence  will  be  acquired  as  to  the  relative  efficiency  of  alternative 
methods  of  job  performance  measurement.  Since  some  aspects  of  job-specific 
performance  will  be  measured  by  more  than  one  method,  and  since  method 
development  and  administration  costs  will  be  recorded,  cost-benefit  can  be 
analyzed. 

By  the  end  of  this  project,  we  will  know  a  great  deal  about  the 

possibilities  and  limitations  of  a  broad  variety  of  job  performance 

techniques.  And,  because  of  the  variety  of  jobs  and  job  settings  we  will 
encounter,  we  will  have  the  additional  perspective  of  having  attempted  to 


develop  measures  for  many  types  of  skills  and  attributes.  Including 
technical,  mechanical,  cognitive  and  physical  requirements.  In  short,  we 
should  be  In  a  position  to  write  the  "handbook"  on  job  performance  measure¬ 
ment.  Such  a  document  should  be  able  to  compare  techniques,  costs,  prob¬ 
lems  of  measurement  development  and  Implementation,  test  validity  and  reli¬ 
ability,  considerations  for  special  populations,  and  the  like.  Data  will 
be  obtained  concerning  the  suitability  of  measurement  methods/media  to 
people  with  different  aptitudes.  For  example,  is  a  written  job  knowledge 
test  as  valid  a  job  performance  measure  for  those  low  -In  verbal- aptitude  as 
It  Is  for  those  high  in  verbal  aptitude? 

Project  A  will  generate  a  validated  procedure  for  developing  Indirect  but 
feasible  proficiency  measures,  such  as  performance  'atlngs  or  knowledge 
tests.  The  project  offers  a  unique  opportunity  to  examine  the  practical 
question  of  the  reliability  of  Indirect  measures  of  job  performance,  (e.g., 
peer  ratings  or  paper-and-pencll  tests)  as  compared  to  more  expensive 
direct  measurement  procedures  (e.g.,  hands-on  testing).  In  addition,  the 
project  will  examine  these  relationships  for  a  large  number  of  tasks, 
reflecting  many  different  types  of  skills  and  knowledge  areas.  We  should 
be  able  to  state  the  conditions  under  which  less-direct  measurement  methods 
can  be  applied,  the  level  of  reliability  to  be  expected,  how  such  measures 
can  and  should  be  developed  and  the  acceptable  limit  of  such  methods.  From 
the  Army's  standpoint,  this  knowledge  could  be  applied  to  evaluate  existing 
efficiency  rating  systems  or  to  develop  new,  more  reliable  systems. 
Moreover,  the  project  offers  a  chance  to  relate  these  Individual  assess¬ 
ments  to  unit-  or  higher-level  performance. 


5-89 


An  algorithm  will  be  developed  for  matching  test  method  to  test 
characteristics.  Such  an  algorithm  will  allow  test  developers  with  subject 
matter  expertise  to  take  advantage  of  the  results  of  our  cost-benefit 
analyses  without  undertaking  the  field  trials.  The  algorithm  will  also 
enhance  the  efficiency  of  future  large-scale  job  proficiency  measurement 
programs. 

5.  A  better  understanding  of  the  relationship  between  school -related 
skills  and  knowledges  and  job-related  skills  and  knowledges  should  emerge 
from  this  work.  If  the  predictors  used  relate  to  one  and  not  the  other,  a 
reassessment  of  the  school  curriculum  or  the  criterion  performance  measures 
might  be  suggested.  Since  we  will  be  developing  our  own  performance  tests, 
a  situation  could  arise  that  would  point  to  the  school  as  the  source  of  a 
problem.  The  table  below  shows  the  possible  Interactions  between  the 
school  and  job  performance.  Cells  A  and  D  are  the  two  that  suggest  a 
school  problem.  . 


PERFORMANCE  ON  THE  JOB  j 

School 

Parformanca 

Low 

High 

High 

A 

B 

Low 

C 

0 

The  role  of  factors  not  related  to  job  performance  per  se,  but  to  the 
contextual  factors  in  which  the  job  Is  carried  out,  should  be  clarified. 
To  what  extent  does  poor  job  performance  (as  In  cells  A  and  C  above)  relate 
to  these  factors?  How  can  these  factors  be  measured  for  criterion 
purposes?  Cell  0  suggests  that  some  people,  at  least,  can  perform  well 
despite  poor  school  performance;  what  contextual  factors  account  for  the 


improvement?  Clearly,  major  changes  In  curricula  and  other  factors 
may  be  needed  to  substantially  raise  the  percentage  of  soldiers  who  do  well 
on  the  job. 

An  important  potential  for  the  research  Is  the  opportunity  to  explore  the 
relationship  among  the  knowledges,  skills,  and  abilities  to  do  the  job 
("can  do")  and  the  soft  skills,  or  "will  do"  aspects  of  the  world  of  work. 
While  the  project  may  not  find  the  solution,  it  should  produce  correlates 
-of- the  soft -skills  that  could  be  used  In  further  studies  aimed  at  thls- 
eluslve  goal. 

6.  A  better  understanding  of  test  bias  for  race  and  gender  should  result 
from  the  project.  Since  the  MOS  have  been  selected  to  include  substantial 
representation  by  Blacks,  Hlspanlcs,  and  women,  field  data  will  provide  a 
unique  opportunity  to  study  the  Interplay  of  task  type,  test  method  and  job 
experience  for  the  various  subgroups. 

Operational  Outcomes 

1.  The  MOS  taxonomy  will  be  defined  In  terms  of  similar  job  activities, 
which  may  be  used  to  augment  or  refine  Career  Management  Fields  as  a 
personnel  management  tool.  Current  MOS  descriptions  suffer  from  a  lack  of 
standardized  terms,  definitions  and  conceptual  categories.  Tasks,  duties 
and  duty  positions  are  described  differently  between  and  within  MOS.  The 
MOS  taxonomy  will  provide  a  consistent,  unified  and  standardized  system  and 
language  for  describing  jobs,  tasks  and  duties,  as  well  as  the  underlying 
skills  and  knowledge  categories  needed  to  perform  those  tasks.  This  will 


5-91 

T'  li  ~ t rrTrirrrT "it tt  t  r  r  Y  ~r  r rrr i  i " 


*ake  It  possible  to  compare  and  possibly  refine  MOS.  Such  an  outcome  Is  a 
prerequisite  for  expanding  the  findings  of  this  project  beyond  the  19  MOS 
studied  in  Project  A. 

2.  An  appraisal  will  be  made  of  existing  job/task  analytic  systems 

. i 

employed  by  the  Army,  such  as  the  AQSP  survey.  The  project  provides  an 
opportunity  to  view  the  cur re  a  SOP  process  across  several  MOS.  Several 
by-products  of  this  review  would  Include:  a  comparison  of  ASOP  job 
descriptions  with  other  authoritative  descriptions  {e.g.,  AR  611-201, 
Soldier's  Manuals);  the  Identification  of  redundancies  and  varying  levels 
of  detail  within  and  between  job  descriptions;  Improved  consistency  In  the 
use  of  terms  and  concepts  among  several  surveys.  Beyond  this,  the  project 
could  provide  a  basis  for  standardizing  some  aspects  of  the  surveys  such  as 
the  polling  of  Incumbents  for  specific  skills,  knowledge  and  physical 
requirements  of  their  jobs. 

3.  The  knowledge  of  the  relationship  between  ASVAB  measures  —  new  and 
old  —  and  performance  on  tasks  will  also  be  of  great  value  to  the  military 
In  developing  test  profiles  for  MOS  not  included  In  the  study  and  for 
emerging  MOS.  Since  the  validity  of  the  ASVAB  tests  for  a  wide  range  of 
tasks  will  be  known,  additional  or  emerging  MOS  can  be  identified  In  terms 
of  those  tasks,  and  an  appropriate  ASVAB  profile  determined.  For  example, 
suppose  MOS  were  described  by  tasks  as  follows: 


X 


1 

2 

3~ 

4 

5 

6 
1 
8 
9 


xxx 
x  x  x 
X  X 

X  X 

X  X 

X 

X 


X 

X 

X  X 

X  X 

X 


n 

Tests  valid  for  MOS-A  should  also  be  valid  for  MOS-G,  but  In  addition,  the 
test  associated  with  Tasks  6  and  9  should  also  be  used  for  MOS-G.  If  such 
a  modular  approach  can  be  developed,  because  of  known  validities  of  tests 
for  tasks,  the  testing  for  MOS  In  the  future  can  become  more  directed,  and 
eventually  computerized.  As  an  enlistment  applicant  passes  a  test,  all  MOS 
that  Include  the  task(s)  covered  by  that  test  are  options  for  him/her. 
Upon  completion  of  the  testing,  the  recruiter  will  have  an  applicant 
profile  and  a  printout  of  all  MOS  for  which  the  applicant  Is  eligible,  and 
the  "best  fits"  In  terms  of  test  scores  and/or  expected  net  utilities. 

4.  The  measurement  Instruments  themselves,  produced  In  the  project  will  be 
an  Important  outcome.  Though  developed  for  only  a  sample  of  MOS,  the  tests 
and  rating  Instruments  could  serve  as  measures  of  training  achievement  and 
Job  proficiency. 

For  the  particular  MOS  Investigated,  Task  5  research  will  result  In  a  large 
set  of  reliable,  valid,  and  user-tested  Instruments,  designed  In  a 
scientifically  sound  manner.  Thus,  Instrument  administration  procedures, 


1 


< 

'r 

£ 

i 

I 


✓ 

i 

‘v 

s 

* 

m 

m 

I 


task  conditions  and  standards  6f  performance,  equipment  requirements, 
scoring  rules,  and  scorer  qualifications  and  procedures  will  have  been 
codified  and  standardized.  They  will  be  Immediately  useful  for  evaluating 
training  and  job  proficiency,  diagnosing  weaknesses  and  strengths  In  school 
training,  diagnosing  requirements  for  00T  or  refresher  training,  and  unit 
readiness.  The  Instruments;  can  have  other  uses  In  units.  For  txjwplt, 
unit  Training  Officers  (or  unit  Commanders)  might  use  a  test  battery  to 
screen  arrivals  for  specific  duty  assignments,  or  use  such  a  battery  to 
plan  unit  training  exercises,  or  to  structure  individual  soldier 
advancement  criteria  (In  lieu  of  SQT). 

5.  The  procedures  used  to  produce  a  test  or  other  measurement  Instrument 
will  be  made  public  and  can  be  used  by  appropriate  people  In  other  MOS. 
There  Is  nothing  mystical  or  esoteric  about  producing  a  test;  the  steps 
Involved  (e.g.,  listing  Individual  steps,  deciding  on  conditions  and 
standards,  trying-out  procedures,  reliability  checks,  etc.)  could  easily 
be  performed  by  school  and/or  unit  personnel.  We  could  produce  a  "how-to" 
manual,  usable  by  anyone  who  desires  a  good  test.  This  manual  would  be 
available  to  all  appropriate  audiences. 


i 


i 

! 


Other  unforeseen  by-products  of  scientific  or  operational  Interest  are 
likely  to  result  from  the  Task  5  research.  Some  of  these  may  be  born  of 
serendipity,  others  of  need.  The  former,  by  their  nature,  cannot  now  be 
described.  The  latter,  on  the  other  hand,  are  illustrated  by  our 
reanalysis  of  the  Army  occupational  domain,  which  was  undertaken  out  of  a 
need  to  classify  MOS  for  sampling  purposes--work  that  may  shed  some  light 
on  the  general  skill  requirements  underlying  the  Army's  Career  Managment 
Fields. 


5-94 


REFERENCES 


f: 


}/« 


5 


r-; 


l£ 


K: 


0.,  Sparer,  O.W.,  &  Berliner,  D.C.  Stu 


idy 

«1on  fadwlaiigs  (TR-NAVTRADEVCEN-1449-1 ) 


ing  Device  Center,  1964. 


of  training  performance 
"  brlando,  FI:  Naval 


Angoff,  W.H.  Test  reliability  and  effective  test  length.  Psychometrlka, 

im,  q,  i-i4. 


N#€.  Individual  dl ff«rences  correlates  of  accuracy  In  evaluating 
gerfeniMCe  effectiveness.  Applied  Psychological  Measurement,  1979, 
3,  103-11$. 


Borman,  W.C.,  Ounnette,  M.D.,  &  Johnson,  P.D.  The  development  and 
evaluation  of  a  behavior-based  naval  officer  performance  assessment 
package.  Minneapolis:  Personnel  Decisions,  Inc.,  1974. 

Borman,  W.C.,  Hough  L.M.,  &  Ounnette,  M.D.  Development  of  behavioral 1y 
based  rating  scales  for  evaluating  the  performance  of  U.S.  Navy 
recruiters  (JR-^G-SU.  San  Diego,  CA:  Navy  Personnel  Research  and 
Development  Center,  1976. 

Brumback,  G.B.,  Romashko,  T.,  Hahn,  C.P.,  &  Fleishman,  E.A.  Model 

Srocedures  for  job  analysis,  test  development  and  validation, 
ashlngton,  D.C. :  American  Institutes  for  Research,  1974. 

Campbell,  J.P.,  Ounnette,  M.D.,  Arvey,  R.D.,  &  Hellervik,  L.V.  The 
development  and  evaluation  of  behavlorally  based  rating  scales. 
Journal  of  Applied  Psychology.  1973,  5.7,  15-22. 

Chrlstal,  R.E.  (Chm. )  Collecting,  analyzing  and  reporting  Information 

describing  jobs  and  occupations.  Symposium  presented  at  the  meeting 
of  the  American  Psychological  Association,  Washington,  D.C.,  September 
1969. 

Chrlstal,  R.E-  The  United  States  Air  Force  occupational  research  project 
(AFHRL-TR-73-75).  Brooks  Air  h)rce  Base,  Texas:  Air  Force  Human 

Resources  laboratory,  January  1974. 

\ 

Cronbach,  L.J.,  Gleser,  G.C.,  Nada,  H.,  &  Rajartman,  N.  The  dependability 
of  behavioral  measurements:  Theory  of  general Izablllty  for  scores  and 
profiles.  New  fork:  ffiley,  T5z2. 

Department  of  the  Army.  Training  management  In  battalions  (TC  2i-5-7). 
Washington,  D.C.:  Author,  19/7. 

DeMalo,  J.f  Parkinson,  S.,  Leshowltz,  B.,  Crosby,  J.,  &  Thorpe,  J.A. 
Visual  scanning:  Comparisons  between  student  and  Instructor  pilots. 

mft-TKTfflGT  - Wlll'iams  APS,  AZ: - HyTrig  Training 

Division,  Air  Force  Human  Resources  laboratory,  June  1976. 

Engel,  J.D.  An  approach  for  standardizing  human  performance  assessment. 
Paper  presented  at  THeHis  Conference,  Lubbock,  Texas;  October  197D. 


I 


• 


V'i 


Vi 


4 


Engel,  J.O.,  A  Rehder,  R.J.  A  comparison  of  correlated  job  and  work  sample 
measures  for  general  vehicle  repairmen  (HumRRO  Technical  Report 
70-16).  Alexandria,  Virginia:  Human  Resources  Research  Office, 

October  1970, 

Fine,  S.W.  Functional  job  analysis.  Personnel  Administration  and 
Industrial  Relations,  1955,  Zt  1-16. 

Flanagan,  J.C.  The  critical  Incident  technique.  Psychological  Bulletin. 
1954,  51,  327-358.  - :  ; -  ^  -  . 

Fleishman,  E.A.  On  the  relation  between  abilities,  learning,  and  human 
performance.  American  Psychologist.  1972,  ^7,  1017-1032. 

Foley,-  J.P.,  Jr.  Evaluating  ma1ntenance_ performance:  An  analysis  (AFHRL 
TR-64-57(I)).  Wrlght-Patterson  Air  Force  Base,  6hio:  A1 r  Force  Human 
Resources  Library,  1974. 

Frederlckson,  U.  Proficiency  tests  for  training  evaluation.  In  R.  Glaser 
(Ed.),  Training  research  In  education.  Pittsburgh,  Pennsylvania: 
Uni  vers Ity  of  Pittsburgh  Press,  l962. 

Harris,  A.,  A  Mackle,  R.R.  Factors  Influencing  the  use  of  practical  per¬ 
formance  tests  In  the  Navy^  onr  Technical  Report  703-1,  August  19&2. 

Harris,  J.H.,  Campbell,  R.C.,  Osborn,  W.C.,  A  Boldovlcl,  J.A.  Development 
of  a  model  job  performance  test  for  a  combat  occupational  speciality;** 
Volume  iT  Test  development:  Volume  II.  Instructions  and  procedures 
for  conducting  a  functionally  integrated  performance  test  (liumRRff 
Final  Report  FK-dD(L)-7b-b).  Alexandria,  Virginia:  Human- Resources 
Research  Organization,  November  1975. 

Joreskog,  K.G.,  A  Sorbom,  D.  LISREL  V:  Analysis  of  linear  structural 
relationships  by  the  method  of  maximum  likelihood  (Research  deport 
51-8.  Uppsala,  Sweden:  University  of  Uppsala ,  1981. 

Klein,  G.A.  Phenomenological  approach  to  training.  AFHRL-TR-77-42. 
Wrlght-Patterson  Air  Force  Base",  Advanced  Systems  Division. 

August  1977. 

Klein,  G.A.  Problems  and  opportunities  In  deriving  training  requirements 
for  design  and  utilization  of  simulators.  Proceedings  of  the  First 
International  Learning  Technology  Congress  and  Exposition,  July  1976. 


Knoop,  P.A.,  A  Welde,  W.L.  Automated  pilot  performance  assessment  In  the 
T-37:  A  feas  1  bl  1J  ty  study.  APHRL-TR-72-6,  Ad-766  446'. 

Wrlght-Patterson  AP3,  OH:  Advanced  Systems  Division,  Air  Force  Human 
Resources  Laboratory,  April  1973. 

Madden,  J.M.,  Hazel,  J.T.,  A  Chrlstal,  R.E.  Worker  and  supervisor  agree¬ 
ment  concerning  the  worker’s  job  description.  Personnel  Research 
Laboratory  Aerospace  Medical  01  vl slon,  Air  Force  Systems  Command, 
1964. 


Lackland  Air  Force  Base. 
September  1967, 


r  rorce 


Personnel  Research  Laboratory, 


McCormick,  E.J.,  Jeanneret,  P.R.  A  Mecham,  R.C.  A  study  of  job  character¬ 
istics  and  job  dimensions  as  based  on  the  Position  Analysis 
Questionnaire  (PAQ).  Journal  of  Applied  Psychology,  1972,  56, 
347-368, 

Miller,  R,B.  Task  analysis:  Sources  and  futures.  Improving  Human  Per¬ 
formance,  1973,  2,  5-27.  -  _  ‘  _ 

Miller,  R.B,  A  method  for  man-machine  task  analysis  (WADC  Technical  Report 
_ 53-137).  Wrlght-Patterson  A'lr  Force  Base,  Ohio,  June  1953. _ 

Montemerlo,  M.O.,  &  Eddowes,  E.  The  judgmental  nature  of  task  analysis. 
Proceedings  of  the  Human  Factors  Society.  22nd  Annual  Meeting.  1973 


Osborn,  W.C.,  &  Ford,  J.P.  Research  on  methods  of  synthetic  performance 
testing  (HumRRO,  Final  Report  FR-tfd ( l)-76- 1 ) .  Alexandria,  Virginia: 
Human  Resources  Research  Organization,  January  1977. 

Prelewlcz,  S.J.  A  report  on  the  state  of  the  art  of  task  analysis. 
Unpublished  paper,  April  5,  1977. 

Rose,  A.M.,  Shettel,  H.H.,  Wheaton,  G.R.,  Bolin,  S.F.,  &  Barba,  M.A. 
Evaluating  the  effectiveness  of  soldier's  manuals:  A  Field  Study. 


Washington,  DC:  American  Institutes  for  Research,  February 

Rupe,  J.C.  Research  Into  basic  methods  and  techniques  of  Air  Force  job 
analysis  -  IV  (AEPfftC-fN-g(>-31).  Lackland  Air  Force  Base,  Texas:  aTF 
force  Personnel  and  Training  Research  Center,  April  1956. 

Schmidt,  F.L.,  Greenthal,  A.L.,  Hunter,  J.E.,  Berner,  J.G. ,  &  Seaton,  F.W. 
Job  sample  versus  paper  and  pencil  trades  and  technical  tests: 
Adverse  impact  and  examinee  attitudes.  Personnel  Psychology.  1977,  30 
(2),  187-197. 

Shlrkey,  E.C.  Preliminary  validity  report  of  the  MOS  evaluation  test  for 
medical  soec 


Liunwii 


inurnanMJ 


1966. 


arnson 


wmui 


esearcn  itu 
U.S.  Army  Enlisted  Evaluation  Center, 


Toquam,  J.L.,  &  Borman,  W.C.  Development  of  first  line  supervisor  behavior 
summary  scales.  Minneapolis,  Minnesota:  Personnel  Decisions  Research 
TstTtuteT" T551 . 


